Alive and well, thank you ...
EMC started aggressively promoting the idea of ILM (information lifecycle management) maybe four years ago.
I thought it'd useful to offer a bit of a retrospective on the concept; where it's worked, where it hasn't, how it's evolved over the years, and where it might go from here.
It's turned out to be one of those divisive topics in the blogosphere, so I'm expecting a bit of commentary to follow ...
Giving credit where credit is due, I think StorageTek was the first company to use the term in public. Their concept of ILM had to do a lot with managing information on tape.
Several years ago, EMC took the concept and evolved it in a different direction.
The original premise was that storage was expensive. And, if you could identify different service levels for information, you could potentially save money by putting the right information on the right service level at the right time.
Simple enough, no? Well, yes and no ...
The Early Years
At that time, low-cost ATA drives were just starting to come into the market. Compared to their FC counterparts, you got more storage for your money, but they didn't offer the same service level as FC drives. Basically, a new choice compared to fast disk and slow tape.
We put them in CLARiiONs, and DMX, and we had Centera, and we were building a CLARiiON Disk Library. And we could see that low-cost disk was gonna be big.
Why? Well, we knew our customers were keeping way too much information on high-service-level disk storage.
Tape was the only other realistic option -- and the costs and hassles of retrieving information from tape in a timely manner meant that if there was any chance the information was going to be needed on a semi-frequent basis, it made sense to keep it on the expensive stuff.
More tiers = more choices = better cost efficiency.
Now, remember this can work both ways. Yes, you can save money by moving information from a higher to a lower service level. But you can also improve service levels by moving information from a lower to a higher service level. This meant that stuff that was on tape that really shouldn't have been could be more cost-effectively moved to disk.
The 3 Phase ILM Model
Clearly, ILM was going to be a progression with customers, so we had to come up with a simple way of describing how ILM progressed in customer terms. We ended up with a farily simple 3-part model that became a de-facto standard in the industry.
The first phase was tiering, the second phase was attacking large apps, and the third phase was an integrated environment. I'll give you a bit more color as we move along here.
The First Phase of ILM -- Tiering Service Levels
The first phase in the ILM model was establishing tiered service levels. Hard to tier information unless you had tiers to do it on, right?
This quickly led to the notion of information classification -- the idea of establishing a simple schema to figure out -- at least from a storage perspective -- what buckets of information you had, what were the expected service levels for each (performance, availability, etc.) and what was the expected cost for each.
Sounds like a trivially simple idea, but it turned out to be extremely powerful. Almost all IT organizations hadn't gone through this bucketing exercise, so the whole problem was tackled in a very ad-hoc fashion.
More importantly, the business users had no idea about how much information they had, how much it cost to store and manage, and so on.
Just creating a few buckets with associated costs meant that IT could have an entirely more sane discussion with the business. Didn't matter who the vendor was behind the scenes, it became a service delivery discussion between the business and IT, and then between IT and their vendors.
Turned out to be a big step forward for us, too. Hard to figure out if a DMX or a CLARiiON or a Centera is the right platform for you if you don't have some sort of paramaters around expected service levels.
The next part of the first phase was building your environment. You usually needed some different tiers that you didn't have, you needed a way of migrating information from tier to tier, and you needed some nice management tools to run it all. Simple enough, no?
Because as we got into this, we discovered a scary, awful truth: most IT shops didn't know how to manage storage at scale. There was a core of methodology and discipline that might have existed for managing servers, or networks -- but not storage.
We'd analyze the environment and suggest classes and tiers. We'd build and migrate the environment to the desired end state. We'd turn it over, and then the problems began.
We realized we had to take a more active role in helping our customers use the technology effectively. We found ourselves creating certification courses, and role descriptions, and run books for our customers. We created analysis tools and processes to figure out if there was even a case to do anything at all. Later on, we got into the business of providing "storage residencies" that had the core process skills required to manage storage at scale.
And we found that there was a large number of customers who didn't want to manage storage at all, and preferred us to do it.
So, the first simple phase of ILM actually got a bit more complex than we envisioned at the outset. But we pursued it with vigor, and as we learned more, we did more. After a few years, the basic motions were in place.
And, to this day, we routinely engage with customers that realize that their storage has gotten out of control, and see value in the first phase of ILM activities: (1) assessment and design, (2) migration and management, and (3) organizational transformation to use it effectively.
Note: just exploring the last few paragraphs would result in at least a dozen related blog posts, which I might come back to later. I'm trying to draw a big picture here, so once again I'm dramatically oversimplifying.
What We Learned During Phase 1 of ILM
Back when different storage price points meant different arrays (e.g. DMX vs. CX vs. Centera et. al.), it became really easy to introduce more complexity into a customer's environment. Lots of boxes means lots of management points. So we learned the trick of doing the analysis up front to see if the exercise really made sense or not.
Turned out that in several environments, ILM didn't make sense. There wasn't enough scale to justify the exercise, or most of the information was really high-service level, or there was value to the simpicity of the management environment, and so on.
Not everyone's cup of tea, but still very popular.
This changed substantially as more and more of the array products began introducing simultaneous use of high-service level, mid-service level and low-service level media.
The DMX, CX and Celerra have done this for some time now. Mix and match different media types in the same box, which makes the management problem much, much simpler, and makes the concept of tiered storage much more appealing for a much wider audience.
We also learned that sometimes you could predict a service level, and sometimes you couldn't. Customers needed the flexibility to move things around in a hurry if something became important all of the sudden.
We also learned that professional services were an absolute key ingredient at every part of the customer engagement. Maybe a customer could figure out a few pieces of the puzzle, but almost no one could crack the code end-to-end.
And it wasn't too long before we started seeing some shiny success stories. Customers who cut storage growth by 20-50%. Opex and capex costs down 15-45%.
All this was nice, but what was probably even better is that IT organizations had taken a big step forward in owning and managing information. They could articulate different classes of information, the rough value to the business, and align it to a tiered infrastructure.
That's big, and continues to this day.
Phase 2 of ILM -- Applications
The first phase of ILM really looked at applications as big, invisible buckets -- it didn't look inside applications and try to ILM-ize the contents. The second phase was a collective approach to look at the big, honkin' applications (email, database, files) and try to make decisions within the category.
The party got started with e-mail.
A few years ago, we noticed that everyone (including EMC) was getting buried in email, and -- more importantly -- the mongo attachments people routinely were slinging around.
Through our Legato acquisition, we had a pretty good product (emailXtender) that we could add to the ILM discussion. Very simply, emailXtender would identify the candidates for archiving, and pull it off the primary email servers in such a way that users didn't notice.
Email instances got much smaller. Server load was dramatically reduced. Network traffic was slashed. Backups became trivial. Single instancing for repeated attachments slashed storage costs.
And so on -- a big cost saver that continues to this day.
But as we got into this market, we noticed a more powerful driver that was coming to the table -- and that was compliance. It started with financial firms, but spread quickly to other industries before long. Email had become a regulated form of information. Specific emails had to be saved and produced on demand, otherwise big penalties would apply.
This sounds like a minor thing, but in reality it was a major seismic shift. ILM (in the case of email) wasn't just about cost reduction -- it was now about staying out of trouble. And the driving force was coming from outside IT -- from the legal guys.
Another trend that continues to this day.
EMC also made another run at application archiving at the time -- large databases. We OEM'd a product we sold as dbXtender, but it ran into some subtle market dynamics.
On paper, this stuff looks cool. It analyzes your database, and identifies the tables, etc. that are candidates for archiving. It "jacks up" the data dictionary, and creates sub-schemas that are invisible to your apps that manage different tiers of database information.
Extremely interesting technology, right?
Until we tried it with customers. In large companies, big databases get used many different ways by many parts of the company. Sure, this data here is idle to this group, but this other group is hammering it constantly. We'd hang the analysis tools off a database, let it run for several months, and come back to find out that there really wasn't much of an opportunity to save much money on storage.
Sure, we had some high-visibility successes (including internally at EMC), but the hit-rate wasn't enough to make it a big deal, market-wise.
Since then, we've come to the perspective that understanding how to move information probably requires more than just looking at access rates, at least for databases. To move information effectively, there has to be some sort of knowledge about how the information is created and how it's used during its lifecycle, and that's kind of an application discussion.
We still do a fair amount of database archiving for Oracle, and SAP, and SQLserver and the like -- but most of it is retention and compliance oriented, rather than "let's shrink our use of expensive storage".
The final entry in this category is file systems, and here I think we've struck gold.
Sooner or later, just about everything ends up in file systems. No one pays much attention to them until they get really, really big or really, really slow. Well, not surprisingly, that's happening more and more frequently.
A company does a quick assessment of storage utilization, and finds that over half is supporting various file systems, and no one can really say what's going on with them.
We've had file system assessment tools for a while (VisualSRM, for example). We've had file system archiving tools, like DiskXtenter for a while as well. We've put enablers in our NAS products to integrate with third-party policy engines (EMC FileMover) for a while. We've even started to offer file virtualization as an archiving appliance (e.g. EMC RainFinity).
But all of those work pretty much the same way -- they look at external metadata (name, date, access, creator, etc.) and try and offer up some easy ways to save money on file storage.
And my opinion is that all of them have been pretty durn successful as such things go.
But previously, I mentioned that -- in the email world -- the need to understand what's in things for risk-reduction reasons was a whole new driver. And it's happening in this space as well.
About a year ago, EMC introduced InfoScape, which will crawl filers and analyze files. When we launched this product, I think we were thinking "save money". It turned out the real need was "reduce risk".
Companies are starting to realize that everything ends up in those public file shares, including stuff that really shouldn't be there. InfoScape will find it for you, and let you decide what to do with it in an automated fashion.
Do you know what's in your file systems?
Putting a perspective on Phase 2, there are some big themes here. Email and file systems are the big targets for cost reduction, with email being the more obvious of the two. Database archiving is worth looking at, but don't be surprised if there's not a huge case for cost savings.
More importantly, what started out as an exercise in cost reduction morphed into an exercise about risk reduction -- and that's a powerful theme to take note of.
Phase 3 ILM -- Integrated ILM
If you step back and think about ILM for a moment, it's not long before you realize that the key is metadata. If you have a nice characterization of a given piece of information around it's value to the business, it becomes an exercise in policy and automation to figure out what service level, how long to keep it, who should see it and who should not, and so on.
And you'd like that metadata to be used for just about everything: emails, powerpoints, database reports, voice mail, etc. It's not hard to envision a world where every hunk of data has a neat label associated with it that helps to do this (hence XAM).
Well, what do we do until then?
One interesting thread that EMC pursued was using Documentum to achieve this. Documentum has an excellent set of capabilities for managing and organizing metadata for rich content, tools to extract metadata from all sorts of information types, an ability to virtualize existing repositories, and so on.
A few years ago, we came up with a layer (Documentum Content Storage Services) that did just that. We were able to use the Documentum-managed metadata to decide all the interesting bits about downstream information management. Yes, it was a pain to set up, but it demonstrated the concept.
Well, it didn't really set the world on fire. People saw the value, that's for sure, but didn't rush to implement it the way they did with the other sorts of ILM software described above.
I think that we were a bit too early with the concept. Trust me, it'll be important in the future.
So Where Are We?
Big concepts are like a tree. You plant them, water them, they grow and branch off into different directions.
On the whole, ILM (as defined here) has been incredibly successful. Hardly a customer meeting goes by where we don't talk about one aspect of ILM or another. And everyone (yes everyone) is doing one aspect or another of what I've described here.
Let's take the Phase 1 discussion -- what's hot now?
Let's see, there are three new service levels to consider -- one around flash memory (tier 0?), thin provisioning and deduplicated backup and/or storage.
Each will present a new service level choices in the landscape, so any customer who's gotten comfortable with classifying information and service levels will know where these technologies fit, or don't fit, as the case may be.
In the protection space, backup to disk, CDP and the like are even more choices to consider. And as security technologies find their way into the storage infrastructure, the ability to classify and tier information will be a very useful skill.
ITIL for storage conintues to be a very hot topic. More and more companies are realizing that information amounts are exploding, which means that storage as a part of IT spend is growing faster than they'd like, which means that they've got to get better at managing the storage environment.
There's more, but this branch of the ILM tree continues to thrive and grow, and shows no sign of slowing down.
If we look at the various Phase 2 discussions, there's vibrant growth as well.
Email archiving has morphed into strong interest around SharePoint, the next big thing in this space. File system analysis and management has become incredibly popular in just the last year or so.
The database and app vendors (Oracle, SAP, Microsoft, et. al.) are starting to introduce capabilities that allow applications or databases to make decisions about when information should be moved to different service levels -- a trend we'll see more of, I think.
More importantly, I think more and more customers have realized that information classification (what's really going on here) is becoming more important.
Pick up a piece of information, and make three decisions: can I save money, make money or stay out of trouble? And rather than multiple archiving applications, they're starting to consider things more holistically, which I think is a good thing.
Again, there's more, but this is a very healthy branch of the ILM tree as well.
It's the Phase 3 discussion that really haven't taken off yet -- unless you know where to look.
Sure, you don't see much discussion in the press, but certain industries are making very substantial investments to tag all of their information and manage it appropriately.
And they're not doing it to save money, they're doing it to avoid risk and leverage their information assets. I'm reluctant to say more, because I think these people see this investment as a competitive advantage, and wouldn't appreciate me going in to too much detail.
The more visible hot spot is eDiscovery. The legal department needs to tag and analyze all manner of information (not just emails!) to better support their mission. And we've seen a big uptick in projects designed to address that need.
I'm going to give it another year, and see how this branch of the tree is going.
The Big Conclusion
There might be pedantic debate about "what is ILM" and what it is not, but I don't look at it that way.
I'll get back to my key themes.
Information is becoming the single most important asset in our economy. With every passing day, there's much more of it. And with every passing day, it has the increased potential to cost you money, make you money, or get you in a whole lot of trouble.
Information is so important, it's going to need an owner, the way a CFO owns money, or HR owns personnel.
I think IT will need to evolve to an ownership role regarding information policy and management. And it's started already. ILM is but one manifestation of these trends.
And if you understand those thoughts, you'll understand why EMC took ILM so seriously, and why we've invested in information infrastructure, which is the logical evolution of ILM concepts.
Are you an informationist?