Monday, August 13, 2007

 

"Individual," "Cultural," and "Institutional" Categorization

Last week at the annual meeting of the Cognitive Science Society, I organized and led off a symposium titled "Semantics in the Wild" with Paul Maglio (IBM Almaden), Teenie Matlock (UC Merced), and Larry Barsalou (Emory University). Our goal in this set of talks was to challenge the field of cognitive science to take a more comprehensive view of categorization and classification. These fundamental cognitive activities have been thought about for millennia (e.g., Plato vs Aristotle on whether knowledge is objective or empirical) and studied in the psychology laboratory for a few decades (e.g., reaction time measurements on judgments of category membership, like comparing the time to answer "is a robin a bird?" vs "is a penguin a bird?"). The four of us got together for this symposium because our own perspectives on categorization are very different and complementary -- while I have spent many years doing "business semantics" and " "document engineering," Larry has published dozens of papers on categorization studies in his psychology laboratory, Paul has studied how people manage information inside of large organizations, and Teenie has taken a psycholinguistic approach.

Paul, Teenie, Larry and I pointed out that in today's world of ubiquitous computing and ubiquitous information resources, we interact daily as individuals and as participants in organizational processes with a bewildering variety of information types, and we constantly make choices about whether and how to categorize them. So we're proposing to broaden the scope of research on categorization to study the explicit activities by individuals to classify web resources (e.g., flickr, del.icio.us, …) and institutional efforts to define and deploy category systems to achieve business and organizational objectives.

Our fundamental claim is that these different kinds of categorization and classification activities or systems lie in a continuous multidimensional space where we can identify three important regions:

Cultural Categorization Systems

Individual Categorization (aka "Tagging")

Institutional Categorization (aka "Business Semantics")

CULTURAL categorization systems are the traditional subject matter for research. These are acquired implicitly through development via parent-child interactions, language, and experience. Formal education can build on this, but the non-formal cultural system can often dominate.

INDIVIDUAL categorization systems are developed by an individual for organizing a personal domain to aid memory, retrieval, or usage. These can serve social goals to convey information, develop a community, or manage reputation. Individual categorization systems have always existed, but they have exploded with the advent of cyberspace, especially in applications based on "tagging."

INSTITUTIONAL categorization systems involve the explicit construction of a semantic model of a domain to enable more control, robustness, and interoperability than is possible with just the cultural system. They are often the collaborative artifact of many individuals who represent different organizational or business perspectives, and they are usually developed via rigorous and formal processes (e.g., in standards organizations like OASIS, where I'm a member of the Board of Directors). Finally, institutional categorization systems require ongoing governance and maintenance because of continuous changes taking place in related cultural and individual systems.

We frankly admit that our thinking isn't fully developed, but it seems that there are many very interesting and important issues to study when you take this broader view of categorization. In particular, we see a number of dimensions or tradeoffs that define the space of categorization activities, such as:

Explicitness vs implicitness
Semantic rigor
Effort to acquire and use
Individual vs group goals
Amount of reuse of other categorization systems
Nature and rate of change over time

This fall in my Information Organization and Retrieval course at UC Berkeley, I'll be using this new framework, and I think it will help students understand better how tagging by individuals in flickr or del.icio.us compares to the "institutional tagging" of business information in standard product classification systems like the United Nations Standard Products and Services Code (UNSPSC) or business vocabularies like the Universal Business Language (UBL).

- Bob Glushko







This page is powered by Blogger. Isn't yours?