Information has very little meaning without context.
And for us to thrive in this new information world, we're going to need some help in understanding what a given piece of information might mean.
Hence the growing importance of metadata in this discussion -- information about information.
If you're just dropping in, we're in the middle of a connected sequence attempting to tie together multiple themes sweeping our industry, and our society as a whole. If you've made it this far; congratualation, we're in the home stretch.
If we step back for a moment, I've introduced this series, written about the growing need for information governance, identified information risk management as the new frontier in security, pointed to the unmet needs of knowledge workers as a crisis in the making, described how the changing nature of applications will change IT, positioned virtualization as creating the potential for "frictionless" IT, speculated a bit on how the cloud might affect us all, and shone a spotlight that -- as digital citizens -- we're going to want more control of our personal information.
And, behind all of this, perhaps our thinking around metadata might have to change in a big way.
The Value Of Metadata To IT
Metadata is nothing more than a label on a piece of information that provides a bit of meaning around it. It might tell us that a given piece of information has business value (hence, keep it around), or perhaps that it's sensitive (hence, protect it), or maybe even expose some interesting attributes for some future use.
The best analogy I use to describe the value of metadata -- at least to IT people -- is our friends at FedEx. Everything at FedEx is driven by the bar code labels -- it tells you where it came from, where it's going, when it needs to be there, any special considerations, and so on.
Now, imagine if FedEx lost their labels for all their packages.
It'd be pretty hard to run their business.
But, every day, we ask IT organizations to manage an increasing mountain of data -- all without useful labels. It's hard to figure out what needs a high service level, and what doesn't. What should be retained, and for how long. What should be secured, and why.
We end up using brute force techniques that are costly and sometimes ineffective, simply because we don't have the fine-grained knowledge we need to do the job right.
The Value Of Metadata To Knowledge Workers
People tell me that "search" is the killer app for knowledge workers.
I disagree, I think the killer app is "find".
All joking aside, search techniques can only get you so far in finding that needle in the haystack. Having some nice metadata around, whether a fixed taxonomy, or socially tagged, can be far more useful.
And, if we're thinking in terms of creating value from the information we already have, certainly metadata plays a large role.
Structured Vs. Dynamic Metadata
Very often, I see people get wrapped around defining the "right" metadata. To be frank, I've gone over to the other side entirely. I've become a "metadata anarchist" to a certain degree.
I think that anytime anyone wants to hang a bit of metadata off a particular piece of information, it's a good thing. Put simply, I firmly believe there is no "right" metadata.
Metadata is (largely) in the eye of the beholder.
Sure, there are tools (like those sold by EMC and others) that will scan data, and try to make intelligent guesses as to proper tagging, e.g. this is secure, this is interesting, etc. -- but these approaches have their limitations.
Call this "structured" metadata -- defined tags with defined meaning and defined policies. It's a starting point, but it's not enough.
Social computing is turning out to be a far more useful approach for classifying information in useful ways. Human computers are far better at looking at something, and deciding if it's important, if it's sensitive, if it might be valuable to others, and so on.
Call this "dynamic" metadata -- loosely defined tags with loosely defined meaning and loosely defined policies. It captures the other side of information -- multiple meanings in multiple contexts.
Keep in mind, contexts change around a given piece of information. An innocuous customer letter might change its status if we become embroiled in a dispute. And, if you've ever been exposed to eDiscovery, you'll realize that, once a lawsuit is involved, how you look at information changes substantially. Or someone is working a new initiative, and wants to mine prior learnings.
A static, fixed, heirarchical approach to information classification might be a good start, but won't get you where you want to be in the long term. Humans can understand meaning and context; although it's sloppy, disorganized stuff, it's also a valuable component when thinking about metadata.
Storing And Representing Metadata
EMC first got really excited about this idea when we launched Centera in 2003. For the first time, we had a method to store flexible metadata with the object itself. Theoretically, you could imagine different functions taking different passes over a given piece of information, each adding a tag or two that captured meaning in a specific context.
But, alas, no one made extensive use of this feature; although many customers did use this capability for things like retention periods and such, or track where a specific piece of information came from.
We got excited about metadata again when we acquired Documentum. If you've spent any time with Documentum, one view is that it's a metadata management system for unstructured content. Powerful tools to populate different forms of metadata, and all sorts of workflow, search, portals, etc. you can drive off the other end.
Some of our customers got this perspective; others were thinking more in terms of building specific applications, rather than an enterprise-wide approach -- but I think this is starting to change.
I personally got excited about metadata again when I started to notice the power of our internal social media platform that we put up last September. People started tagging stuff, and we all noticed that tagging (however imprecise) was far better at finding (and understanding) information than any hierarchical taxonomy, or even search, for that matter.
And, finally, this week the industry stood up and rallied around XAM -- an industry-standard extension of the Centera API concept, broadly supported by multiple vendors. Now, for the first time, we have a generic way of capturing all sorts of metadata, from all sorts of sources, and it isn't tied to any one vendor's offering.
Tres cool. Steve Todd wrote a bit about this, if you're interested.
So, What Does All Of This Mean?
If we're going to buried in an avalanche of information, we're going to need some help to figure out what it all means from different perspectives -- hence the importance of metadata.
Although our natural temptation is to think in terms of structured, static metadata, that's too limiting -- we need to think in terms of a balanced approach that complements the structured approach with contributions from social computing.
Metadata needs to serve multiple masters, and its mission will evolve over time, so think in terms of an architectural approach, rather than simple, purpose-built implementations.
And, finally, I think we'll live in a world before too long where most information has its own set of labels -- just like FedEx packages.
Comments