I'm sure you've been following the convoluted tale of Rep. Foley. And now, it's spilled over into the wider arena of power politics, who knew what when, and so on. Maybe certain people knew, maybe they didn't -- but there was a clear, digital trail of the behavior for quite a while before the pot boiled over.
Without commenting on specifics, there's a lesson here for others -- more often, it's what you DON'T know that can really get you in trouble.
<begin product plug>
I'm going to talk about a relatively new EMC product. Not because that's my job, but because it illustrates a response to a growing problem in IT -- you're going to be held responsible for stuff you may not know about.
The product is EMC Infoscape. If you're curious, look here for more information.
If you don't want to wade through all the marketing gloop, basically what it does is
(1) discover all the potential file servers in your environment,
(2) crawl all of the files looking for keywords, patterns or other things of interest,
(3) runs what it finds through a policy engine that makes decisions about what to do, and finally
(4) executes that policy -- move it, delete it, protect it, log it, retain it, load it into a repository, call the HR guys, call the legal department, and so on.
It's a pretty neat integration of multiple technologies we've acquired over the years, but it focuses on a particular use case that's becoming more frequent.
Imagine you run IT for a bank (or insurance company, or a retailer, or etc.). Lots of stuff in those public shares, right? There are about a bazillion files out there. Some are important. Most aren't. And there are a few that might contain things that can get you in serious trouble, like confidential information, or information that can be used to identify a specific person or account.
You'd like to move the junk out, but you'd also like to look for potential hand grenades out there as well. So here's what you might need to think about
Observation #1 -- You Can't Manage What You Don't Know About
The product includes a discovery capability -- it looks on the network and walks backwards to find all the file servers in your environment.
You do know about every file server in your environment, don't you?
In smaller shops, the answer is "of course", but in larger shops, that can be a bit more difficult. During the beta tests, we found that there usually were always a few file servers that someone had forgotten about. Even if you have things pretty much under control, it's nice to be able to prove it!
Observation #2 -- People Are Human
We all know we're not supposed to leave customer social security numbers lying around in files and spreadsheets, or customer account information, or other sensitive/inapproriate information. But it happens no matter how much you try. And, without too much imagination, you can easily think of a scenario where each one is a potential bomb waiting to go off.
During one beta, not only did they find more than a few file servers they didn't know about, but when they started trawling through open file shares, they were a bit horrified to see the kinds of things people leave on file shares.
Yes, there was the usual inappropriate stuff (doh!), but there were some real humdingers in there that had serious legal and financial consequences if it wasn't managed appropriately.
Not a few dozen files. Not a few hundred. Literally thousands and thousands.
It was a sobering moment all around ...
Observation #3 -- It Ain't Gonna Work Unless It's Automatic
The only way to reasonably approach this sort of task is automation. I know I wouldn't want to be the guy looking at every file on every file share. But automation doesn't really capture the spirit of what's going on here -- the english language is failing me.
Automatic discovery of new file sources. Automatic scanning and classification. Automatic extension to keywords and rules based on exception. Automatic execution of policy (move, delete, archive, protect, flag, etc.). Automatic loading into a searchable repository, if you're so inclined. And automatic reporting of what it's doing and what it's found.
Lashing together mulltiple tools to do all this is achievable, but hardly practical. And unless IT can push a button and say "go do!" it ain't gonna fly in this use case. And for the more daunting challenges around information management, customers are going to need this kind of use-case automation.
Observation #4 -- Use Information To Tell The Infrastructure What To Do
Automation is nice, but who's going to tell the automation what to do? For information management, the answer is turning out to be use the information itself to drive policy and automation. And Infoscape is a classic example of that.
Simple examples might be "found the word CONFIDENTIAL" triggering the file being moved to a more secure environment than a public share. Or "found somethng that looks like a social security number" being flagged as a confidentiality violation.
(no, I'm not going to give you keyword examples from the Rep Foley case, you'll just have to use your own imagination)
Static rules won't work. Simple metadata won't work. Googling for keywords won't do it for you, either. You're going to need a pretty sophisticated rules engine and pattern matching to make it all work, which is what we've done here. And there's going to have be a neat feedback loop with a skilled observer to help tune the rules engine.
Observation #5 -- Tactical Solutions Should Lead To Strategic Opportunities
OK, now that you've crawled every file, searched for relevant phrases and patterns, now what?
How about loading the interesting bits into a repository to be managed? Maybe add enterprise search to the mix? How about harnessing that information to give 360 degree views of customers, business processes and the like? Workflow? Collaboration? Whoa!!
(that's Documentum, in cased you missed it :-)
Put differently, one of the big challenges in enterprise content management is getting people to fill out metadata forms so the important files can be categorized and classified appropriately. Anyone who's tried to get users to do this ends up with all sorts of files categorized under "stuff" or "other", which kinds of defeats the purpose.
OK, now we'll automate that process with the same tool we used before, and now we can build a knowledge base of useful, interesting documents that have some half-decent metadata pre-extracted.
The idea is simple -- one solution should lead to another, and that's demonstrated here.
So, whether you think this is a good product or not, or whether you have this particular problem or not, think about this recurring theme: ultimately, IT will be responsible for understanding the value (or risk!) of their information, and acting appropriately.
<end product plug>

Comments