Rich Media and Categorization Standards

One of the good news/bad news developments playing out with social networks revolves around the vast amount of data being created and uploaded every minute. On one level the model works; sites like Facebook, MySpace, Hi5, etc. are pulling in members at enviable rates, but more importantly, the members are active users of a broad range of rich media technologies. User generated content is the key driver for success for social networks, and it is being generated in staggering volumes. That’s the good news.

The not so good news is that this content is poorly organized; the vast majority of people uploading rich media files onto social networks haven’t got the slightest idea of what metadata or vertical taxonomies are, much less how to classify what is being uploaded. While taxonomies or metadata may sound like wonk-speak to most people, they are a core requirement if anyone plans to find anything on a social network website.

By comparison, most content generated in a corporate setting is created by professionals who categorized the information, either manually, or using applications delivered by content management systems. This works because most corporations have a vertical taxonomy that is specific to their use of language; pharmaceutical companies, chemical manufacturers, medical device manufacturers, etc. all use language that is specific to what they do. The information is categorized according to the organizational rules for that taxonomy, on the assumption that easy access is the key deliverable for any content generated.

This model works fairly well for text-centric content in a structured corporate setting, but less so in an unstructured social setting, and even less so for rich media such as videos, audio files, ad hoc web pages (think of anything on Facebook). The social scenario is further exacerbated by the fact that users create rich media content, upload it to their computer, then upload it again to a (e.g.) photo site like flicker or photobucket, where is then shared far and wide across a broad range of applications and networks, and/or is subsequently syndicated.

So the challenge here is how can rich media be categorized in a semi-automatic fashion, using tools that are easy enough to use that any Facebook user will intuitively start categorizing their data, ideally without even knowing they’re doing it? And this only covers the search angle within the first place the data lands after it leaves the user’s computer. How about all those folks trying to syndicated videos, where there are multiple layers of use and re-use? Using distribution tools like RSS feeds to syndicate data across a broad range of integrated social networks is like firing into the dark.

And finally, who is in the best position to drive the development and implementation of a standard to define categorization of rich media? It won’t be the end users; they’ll just move on if things don’t work the way they’re supposed to. Standards bodies are a viable choice, several like OASIS are already driving initiatives across a broad range of content schemas like DITA; this would be a natural fit for them. However, the sector that really has its neck stuck out are the social networks; the development of categorization standards for social networks goes beyond basic exchange of information (for example, Open Social), and needs to focus on core value deliverables such as search and syndication. Social network’s value is in their content, that’s the whole point of the network. If millions of users can’t find anything, and can’t find a graceful way to distribute what’s been uploaded across all the multiple social sites to which most of them belong, the entire thing will eventually collapse under its own weight.