One of the benefits of frequent customer interaction is that you get to spot rising -- and sometimes declining -- customer interest in various topics.
Object storage has been around for a while -- a few people understand it, but many don't. And in the last few weeks, I've found myself getting into an object storage discussion with customers without really intending to.
And the ways we get there are interesting as well ...
How Does This Come About?
On an average week, I'll probably see between three and seven customers in EMC's Executive Briefing Center. The average EBC visit has some distinct components: customer shares with us what's going on in their world, we share what's going on in our world, and then a sequence of focused deeper-dives: storage, security, backup, process, real-world experiences, demos etc.
Since the customer or partner is making such an enormous investment in time and effort to come see us, we want to do a lot of prep work to make sure the topics are beefy and relevant, and not just sequential product pitches.
On an average day, there's usually 10 or so groups coming through Hopkinton; more if you count our EBCs in places like Cork and Santa Clara. The norm used to be one day visits; now it's more like two and sometimes three. We'll start off together as a group, and then break into sub-tracks depending on different people's interests.
My favorite spot is near the beginning of the visit: set the stage, get the dialog going, and so on. I usually try and structure my time so that we have some -- well -- unstructured time. I enjoy probing around on likely topics that aren't formally on the agenda for one reason or antoher, and frequently I find pure gold.
Maybe they didn't think EMC had something to say about the topic. Maybe it's a half-formed thought that's growing in importance. Maybe it's just idle curiousity about one thing or another. It doesn't matter, really -- we end up getting in to topics that are more interesting to the customer, and -- hence -- more interesting to me.
How Do We Get Onto The Subject Of Object Storage?
What strikes me as interesting are the multiple paths that get us to essentially the same subject. Not that I'm intentionally steering all these conversations to a place where we have an arguable technology and experience advantage -- it's just where we end up.
For example ...
"We're getting sued a lot"
One frequent starting point is that the company is finding itself in more lawsuits over time. Legal action means discovery of information assets, plus associated workflows that support specific legal processes.
Although EMC has popular application offerings in this category, there's an underlying set of capabilities needed that not only can support eDiscovery, but other facets of information management.
Using eDsicovery as a simple example, I tend to boil it down to three aspects: you need to generate metadata, you need to store and manage metadata, and you need to use metadata intelligently.
Most people have found limited success with formal (and static) approaches to categorizing and tagging information. It all gets lost in a thicket of MDM-like debates and requirements to build applications that are smart about taxonomies and associated semantics. For narrow, specific and well-defined business processes -- sure, this can work well -- but as a broad-based approach it is most decidedly unappealing.
Using search-oriented and context-aware technologies to do this is much more productive as far as I can see. You can easily spot trending topics in the enterprise information base, and then decide how you want to handle them. Not perfect, but very flexible and straightforward.
And "using metadata" in an eDiscovery context is also rather straightforward: there are some well-defined workflows and capabilities that these people need -- again, supported by the schemas around the data, instead of the data itself.
The interesting part is how best to associate data with metadata? Historically, this has conceptually a big database pointing at files or other items. But the data and metadata can easily get out of sync. Scaling problems can be a concern over time. And, most concerning, it can be easy to bypass the metadata and get at the data (or change it) directly.
Intellectually, most people can get around to the object storage view: metadata (and associated policy implications!) are directly associated with the object or file. There's no easy way to bypass the mechanisms to view or modify the underlying data. Things never get out of sync, and any potential scaling limits are w-a-a-y-y-y out there.
"We don't want to be sued"
Healthcare -- and their need to manage patient health records of all varieties -- is a similar situation. There are rules about how the information must be managed. You can attempt to define, enforce and monitor this at the application layer, or farther down at the information access layer.
For those people who approach it as an application-layer issue (completely separate from the storage and information access layer), it can be a rough go to convince them of the challenges they're potentially signing up for by perusing this approach. I mean, people do it this way all the time, so there's plenty of industry precedent for doing it the hard way.
If we're talking about a broad portfolio of applications (vs. a clustered group of more narrow ones), that's a key point. If there are increasingly challenging regulations that must be provably complied with, that's another. And if they'd like to spend less on the overall IT effort, that's yet another argument for an object approach.
In the EMC portfolio, that's usually something like a Centera sitting behind the application and access layers. There's built-in strong capabilities to define, enforce and prove compliance policies at an information infrastructure (vs, application) layer. And, of course, the IT guys like it because it's a dead-simple environment that's extremely cost-effective and easy to manage.
But you do have a choice.
"My file systems are crumbling ..."
There are interesting pockets of the IT landscape where people routinely store a lot of files: hundreds of millions to many, many billions. There are no really demanding bandwidth requirements, it's just lots and lots of things. The files don't change very much, either.
Sure, in some cases you can use lots of more modest filers, but what if you want to look at it as one big entity, vs. a large collection of smaller ones? File-oriented global name spaces only get you so far -- at some point, the idea of a hierarchical file system starts to break down from a scalability and manageability perspective (not to mention operational efficiency) and people start looking for another class of solutions, rather than just a bigger filer.
And anyone who's got one of these environments has the nagging concern that -- some day -- some effort will be need to be made to categorize, tag and start to intelligently manage all this stuff. And a "database-pointing-at-a-file-system" approach is a complete non-starter, due to scale.
Again, performance and management effort are essentially linear with object storage solutions that are inherently built scale-out -- rather than a modest object layer simply hoisted over a traditional filer.
In the EMC portfolio, that tends to be either Centera or Atmos -- depending. But there's a key difference that has to do with geography.
If all the information -- and its users -- are rather localized, a Centera does just fine. There's some good replication to make copies for protection purposes, but the underlying thinking is that information generators and information consumers don't have latency concerns.
"... and we need a global model"
But -- as the world goes more global -- it's becoming more frequent that the people generating and using the information are not only scattered around the world, but their access patterns are notoriously hard to predict. And -- let's face it -- latency sucks.
That's where the Atmos model shines: it uses its metadata to automatically relocate data (or restrict its relocation!) to meet performance, cost and availability constraints. Not to mention comply with the growing tangle of regional laws around what kinds of information can freely travel, and what kinds can't :-)
I've met people who've tried to solve this problem using traditional filers, global name spaces and some sort of brute-force replication or caching approach. To an individual, they haven't been happy with the results. No surprise, really.
"... and our objects are getting really, really big".
Most people think in terms of smaller objects: powerpoints, documents, etc. The game changes a bit when you start considering the newer larger objects: videos (esp. HD) and -- more recently -- self-contained complex data objects.
One interesting discussion involved the need to manage a growing catalog of over 2,500 self-contained VMware images -- all of which not only had to be globally distributed, but were subject to compliance and archival restrictions.
Another customer was a proficient user of business analytics (think Greenplum) and had a new requirement to start to manage and preserve the point-in-time data marts that led to key business decisions.
A half-terabyte object was becoming the norm here. And there were lots and lots of them.
For these scenarios, filesystem constraints measured in terabytes are non-starters -- the objects themselves can approach that size, and even a multi-petabyte-class architecture seems like only a temporary stopgap to some.
Environmental issues like space, power and cooling can be very important as well when you're considering this sort of scale. For example, you wouldn't believe how big a deal it can be that an object storage platform can intelligently spin down drives that it's not using actively.
The quaint notion of "backing up" an environment this size also needs to change as well; abvanced information protection needs to be built in, not bolted on.
Many Paths -- One Destination?
I could take the easy way out, and simply state that all of these situations are nothing more than manifestations of our expanding digital universe -- an average ~60% CAGR in the amount of information we're all generating and storing.
But -- as William Gibson once famously said: "the future has already arrived, it's just not evenly distributed yet".
I'm frequently getting exposure to the places in the industry where the future is already here for some people: they're being strongly encouraged to drop the historical paradigm, and embrace the next one. It's just that the roads to that point (object storage in this example) can be very different indeed.
By the way, if you can get your team out to Hopkinton at some point for an EMC EBC, I'd love to chat ...
Interest in "object" storage should have heated up a decade ago Chuck. Frankly, I am very disappointed that EMC hasn't done more--much more--since it bought its way into information management with Documentum several years ago. I was expecting to see a savvy storage company move core information management functionality down into the storage infrastructure where it belongs. That simply has not happened for whatever reasons (likely more political than practical).
The storage industry needs to move away from the notion that businesses manage simple information objects. Most businesses manage webs of information assets where just as much of value lives in the webs (i.e. in the relationships/metadata) as in the individual files and database records.
It is in this context that the concept of an "object" becomes diluted. Is an object always an individual file or sub-file component? Or could an object also be a loosely coupled collection of widely distributed information assets and metadata (a "web") managed as a whole?
Storage technology must understand how to preserve and protect not only the individual storage and information assets, but the webs as well. Think about the typical EMC engineering or marketing project. Think about the number and variety of information assets created, modified, related, exchanged and distributed over the life of the project. Now ask yourself, how do you protect those assets? How do you preserve them? How do you ensure that the dots remain connected, the relationships unbroken in backups and archives? How is it applied to managed information (e.g. assets in an ECM)? What about the information not currently formally managed?
Posted by: josephmartins | September 21, 2010 at 09:45 AM
Hi Joseph
You and I both saw the opportunity for a more integrated end-to-end approach. However, no matter how compelling the architecture, it has to be packaged in a way that people can consume it.
Early on, we had meetings with larger customers where we'd share our thoughts in this arena. Occasionally, a light bulb would go on, which would then quickly dim as more pragmatic organizational boundaries and processes were considered.
So, we made a decision to focus on areas of integration that people could consume without waiting for all the pieces to fall in place. The underlying technology can easily integrate as you suggest (and is doing so in a relatively few situations), but I don't think the market is entirely ready to consume an end-to-end approach.
Probably a better topic over beers :-)
-- Chuck
Posted by: Chuck Hollis | September 21, 2010 at 01:13 PM
I understand Chuck. We've encountered the same attitude from our clients and their clients over the years.
Your customers are feeding you the same old lines that we've heard thousands of times from people who are, frankly, reluctant to make real change within their organizations.
They say they have immediate needs and problems to solve and can't really afford to think about the future. In fact, they're inclined to focus on short term objectives (often at the expense of long-term impact) because that's how many of their employers have structured incentives. And today's job-hopping IT types aren't going to stick around long enough in one organization to experience the long-term impact of their decisions. To many of them, the future is simply not their responsibility. It's a mess left for someone else to clean up.
(Sounds a lot like the general attitude toward social and environmental issues, doesn't it?)
The problem with that line of reasoning, in the context of long-term information management, is that it often results in short-sighted decisions. The decisions today about the technology to adopt, the processes to implement and the people to hire MUST be made in the context of a long-term objectives. Otherwise some unfortunate employees a few years down the road are going to inherit even greater pain, and substantially higher cost to fix what should have been fixed years prior.
Every presentation I've ever done on information management ends with the line "pay me now or pay me later" because eventually customers are going to pay the piper. The challenge is in convincing them that it's only going to get worse the longer they wait.
Hope to see you around at EMC's Analyst Day to continue the discussion.
Posted by: josephmartins | September 22, 2010 at 11:12 AM
I think you two are talking about the same thing. I work in the ECM space so I know intimately how the objects and metadata are managed in that environment. The various ECM spaces, including Documentum, will probably be commoditized or clouded before an end –to-end “Object Storage” system can be realized. The development of Microsoft’s SharePoint attests this trend. But the true end game is going to be replacing “File Storage” as we know it into real “Object Storage.” I believe this even though the user community will not be ready for this for a while, the technology that will enable true “Object Storage” is not ready yet and it may not be available quite some time. Unlike the current defacto “File Storage” system spearheaded by Microsoft, the ubiquitous “Object Storage” system that could replace “File Storage” might be a dream at this stage, I believe it will eventually materialize. This may be a better topic over beers, but all the more so over spirits, perhaps. Cheers!
Posted by: shiningarts | September 29, 2010 at 07:15 PM