Well, there certainly are many who think so.
On the vendor side, I'd put 3Par, IBM, NetApp and doubtless a few others I forgot to mention.
Being the ever-pragmatic marketeer, I've been guilty of slapping new labels on old ideas a few times over the years -- and sometimes gotten away with it.
But in this case, I don't think it's going to work. Clouds are very different -- or will be very soon.
Why Are Clouds Different?
As a starting point, let me replay my uber-simple cloud definition that seems to demystify about 90% of the discussion.
First, clouds are built differently than traditional IT -- think pools of resources with dynamic allocation of services.
Second, clouds are operated differently than traditional IT -- low-touch models become zero-touch models.
Third, clouds are consumed differently than traditional IT -- generally, pay for what you use in a convenient fashion.
Private clouds go a bit further, and assume you're doing this with virtualization (presumably VMware) in such a way that you can do this behind the firewall, using external service providers, or any dynamic combination of the two.
Built differently, operated differently, consumed differently -- how do these concepts this apply to our nascent category of "cloud storage"?
Cloud Storage Is Built Differently
Architecturally, most people agree that scale-out, clustered environments are, generally speaking, the best approach for building individual storage devices in this world. The idea is to scale out, not up, and do so using reasonably-sized building blocks that use commodity technologies.
No real argument on those points, until a fierce debate erupts here as to who's got the best scale-out, the best cluster, etc. I think these people are too close to the problem, and need to step back a bit.
What gets interesting is when you start framing the "cluster" as geographically dispersed -- e.g. your storage architecture isn't limited to a single footprint in a single location.
What then?
As I mentioned in an earlier post ("Overcoming Distance"), latency is not our friend. Sure, not an issue if everyone happens to be reasonably co-located to your "cloud storage", or doesn't care about performance, or legal restrictions, etc. -- but that's a very limiting set of assumptions, isn't it?
From my (undoubtedly biased) point of view, I can't see something as "cloud storage" unless it understands and can compensate for geography. Example: Atmos has policy-driven geographic optimization capability that's part of the storage, and not some bolt on. It makes and moves copies of data to vary performance, cost and availability.
Another popular use case is moving information out of the data center, into an external cloud automatically, securely, etc. -- and still being under the control of IT as a logical extension of their environment. You need to think about multiple storage locations operating as a single environment, and not the individual physical storage devices.
A big pile of rotating rust sitting in a data center doesn't do that -- no matter how cheap or scalable it might be. Metadata turns out to be the key to addressing this issue in a fundamental way -- as we'll see in a bit.
Cloud Storage Is Operated Differently
Most of the vendors are trying to emphasize how easy it is to manage their storage as evidence that they're great for the cloud. Nothing wrong with being easy to manage, but I would argue that they're taking a legacy IT paradigm, and trying to shine it up a bit.
Perhaps the correct thought isn't "easy to manage", it's more like "doesn't have to be managed".
Single pane of glass is nice; no pane of glass is even better. Or was that "glass of pain"? I forget :-)
To do this effectively, though, requires a few things that you don't find in traditional SAN and NAS models. First, the management abstraction isn't storage itself, it's "storage service supporting applications". You want to see how the service is being delivered, and whether it might be time to consider adding more resources, or perhaps changing a policy.
If you have the luxury of application integration, the model gets even better -- the application or service tells the storage what it wants through the magic of metadata. You get essentially a zero-touch model. Storage management can disappear entirely as a separate discipline.
Again, metadata becomes a key concept in making this all happen.
Cloud Storage Is Consumed Differently
Yeah, yeah -- multi-tenancy, thin provisoned, maybe some dedupe -- I get all that. But there's nothing cloud-specific about that, is there?
I will argue that we need to think beyond mere capacity.
One of the things we want our cloud model to do is to be able to dynamically vary storage service levels as the world changes -- from very slow (and cheap) to very fast (and presumably more costly) -- and back again.
Or from a single copy that's never backed up to multiple, redundant and snapshot copies -- and back again. Or lots of copies spread around the world to increase performance and resliency -- and back again.
And to do all these acrobatics just when we need it, without having to decide ahead of time. Even better if it provides granular metering for precise billing and consumption.
Networks do similar things today. Fully virtualized compute environments as well. Why not our cloud storage?
Metadata helps us do this as well, basically "all data associated with this tag now has to behave in this different way". Wherever it might be.
Metadata: Built In, Or Bolted On?
I kicked off an interesting discussion a while back with a controversial post ("The Future Doesn't Have A File System" and "Of Objects And Files") which provoked a vigorous discussion from vendors who were heavily invested in block and file paradigms, but hadn't invested in newer, object-oriented and metadata-driven storage architectures.
Some of it was the usual competitive positioning, but there were a few people who just didn't understand what the fuss was about.
Since then, I've come up with a simpler framing that might help.
We need tell information what we want it to do: where it needs to be, how many copies, how fast, how reliable, who can see it, when can it be deleted, etc. Metadata is a convenient way of thinking about "instructions about how we think about the information".
If we want to change behavior, we can simply change the definition of how we interpret a given label, e.g. "old email files".
The question becomes -- where does this important metadata go?
Do we put it in multiple external repositories? Do we try to generate metadata by observing how the information is being used, or perhaps search its contents?
Or do we simply associate the metadata directly with the information, and let the storage cloud manage itself?
Beyond The Pile Of Disk?
I fully understand that cloud terminology is very buzzworthy these days. And I don't really blame any vendor that attempts to dress up a few existing offerings as "cloud-ready". We all have to make a living.
But I think that when we start talking about cloud storage, there needs to be a discussion about what needs to change, rather than what stays the same.
Your thoughts?
Speaking of cloud - you may enjoy this one (we just put it up):
Cloud Cloud Maybe - a parody
http://blogs.vembu.com/2009/10/cloud-cloud-maybe/
It takes off on all the usual suspects - Amazon, Nirvanix, Parascale, Sun Cloud (?), Azure etc - and yes, Atmos too.
Posted by: Lux | October 09, 2009 at 09:43 PM
I find the comment about your not blaming any vendor that attempts to dress up a few existing offerings as "cloud-ready" because we all have to make a living ironic.
Isn't that exactly what your doing with the acquisition of Kazeon?
Posted by: Travis Brown | October 10, 2009 at 04:07 AM
@Travis
If there was a consensus definition around "cloud", we'd have a better shot at defining "cloud ready".
I am not close to the Kazeon product, but here's what I would be looking for: can be consumed as a software service and/or traditional on-premise models, can store information on geographically distributed storage devices (e.g Atmos or similar), and maybe something else I forgot.
The motivation for the Kazeon acquisition was not primarily a cloud-related one -- I don't know where you're connecting the two concepts, since they're largely unrelated.
Over time, almost all EMC products will have to work in some sort of cloud model -- it's inevitable. Hard to single out Kazeon or anything else in this regard.
Irony is in the eye of the beholder.
-- Chuck
Posted by: Chuck Hollis | October 10, 2009 at 08:26 AM
myself I don't find "cloud storage" very useful yet, I mean it's great if you can build an application that can take advantage of it, but given that most cloud storage systems are API driven to some degree, it doesn't really help me in the IT/OPS side of the company. Not only is it cumbersome to work with but bandwidth requirements can be excessive as well. We have a roughly a gigabit of committed bandwidth available to us, and even at that level performance isn't great, and latency is of course horrible.
Not saying that cloud storage is bad, it certainly has it's uses, but it's getting frustrating that so many people out there seem to be touting the cloud as something that can just be a drop in replacement for everything(I don't mean you Chuck).
Working with amazon's S3 for example was really frustrating to deal with the limitations and security implications with regards to cloud storage(e.g. splitting things up into 5GB chunks, encrypting it etc).
Bandwidth isn't cheap, and I believe our data sets are expanding far far faster than our pipes are. Take a look at U.S. broadband bandwidth vs storage as an example, or look at the explosion of data usage on AT&T's network for the iPhone for another example as far as how inadequate, yet expensive bandwidth is compared to local storage.
In an ideal world, cloud storage would be great, but in my opinion internet networks aren't ready for it(any more than they are ready for wide scale streaming video, sure you can make it happen but you will pay 3 arms, 2 heads and 5 legs for it), and the privacy/security concerns of running everything in a public cloud is likely to significantly hamper it's adoption.
The one thing I could see happening though is "cloud storage" technologies being shrink wrapped and sold to organizations who can use them to leverage their infrastructure to provide higher utilization of the gear that they have. Though I think that is still a ways off.
Posted by: nate | October 12, 2009 at 06:26 PM
Cloud computing provides even more opportunity to relinquish the little bit of control that remains over the IT environment. Now, we will not even know where our data is or who we depend upon from one moment to the next for our mission critical systems. Systems will never operate the same way again from one day to the next. They barely do now, anymore. Just don't depend on anything and we'll all be fine. Great new gimmick for business to make money though... till ther next one comes along. It will be the users, the ones who keep the businesses and gov't departments afloat and operating every day who will be left to clean up after the party is over, as always.
Posted by: Jeff K. Smith | October 13, 2009 at 11:47 AM
Who cares?
Whether you call it "cloud" or not, whether it "qualifies", doesn't matter in the slightest.
The only things that matter are:
Do I have a use for this?
Does it do something for me that another technology can't?
Is it secure, I mean really secure (not just a bunch of boxes ticked off while installing each component)?
Back in the mid 90s I worked on a project management system in Oracle, and there were pieces of the system and data strewn across half the world. It worked, and it was totally transparent to the users and programmers. It was as secure as dedicated telecommunication lines could make it. It did the job, and at the time there wasn't any other way to do it.
This is pretty close to what you described, but is it cloud? maybe. Does could computing require that everything be "on the net", or does a private (semi-) global network qualify?
Posted by: Paul | October 13, 2009 at 02:16 PM
@Paul
Your points around "what it does" vs. "what to call it" are very valid indeed, and bear repeating. My post is more of a reaction to various vendors in the storage industry recycling their wares, and calling it "cloud".
I'd like to keep the focus on what's different, rather than what's the same :-)
Your distributed project management application at Oracle sounds very cool indeed, but I wouldn't call in necessarily a "cloud" -- didn't run on pooled infrastructure that was dynamically consumable, for example.
Your point on "everything on the net" is a good one -- I think the higher-order concept everyone's grappling for is "control" -- security and service delivery -- regardless of who owns the underlying infrastructure.
Thanks for commenting!
-- Chuck
Posted by: Chuck Hollis | October 14, 2009 at 12:19 PM
@Jeff
I'd agree with you that IT needs to retain control of the overall IT experience -- service delivery, security, etc. -- otherwise it will be a grim world.
One of the things I really like about the private cloud model is that it allows IT to retain control regarding all aspects of the IT strategy. Without someone in control of IT, it'll all turn into a big mess.
-- Chuck
Posted by: Chuck Hollis | October 14, 2009 at 12:21 PM
I got a chuckle out of comparing traditional file systems to your experience with Hollerith Cards. It must have been a tough going until you learned to automate the process. How tough can life at UC Santa Cruz get? Especially in the late 70’s. Beach parties, the boardwalk, coeds - I mean how tough was it? At least you didn’t have to make the trek to the computer center in the pouring rain, like it did 85% of the time where I went to school. It was automate or grow webbed feet (with the Nike logo clearly visible).
Your posting did have me thinking about meta-data. At Tektronix some years back, we were involved in the whole TV station work flow automation evolutionary process. Large TV Broadcasting groups were buying up the independent stations. It became clear that the opportunity to leverage content, especially in news production was an important means to differentiation and efficiency. If executed well, this translated into ratings and advertising dollars. I began to understand the importance of rich meta- data in describing content. It was so much more than simple file and directory names. It was rich descriptions of the data; the asset; the inventory. Back then, it was often impractical to keep full resolution video content, either in the form of Betacam tapes or even on disk drives (e.g. Profile) in every station. Instead there was interest in meta- data capture and routing between locations. Quick searches could be executed, low resolution edits based upon low bandwidth consuming copies could be done, before broadcast quality video was delivered and prepared for transmission. As we think about cloud computing, the extension of work flow automation where data is king comes to mind. Descriptions of the data drive how the plumbing routes it, displays it, and protects it. This can be made as manual or as automated as users want it to be. Plumbing will be built on commodity hardware, and of course the tools we have come to appreciate, such as de-duping and thin provisioning need to be there. We could see meta data that describes meta data and find that CPU cores and the associated memory structure are spending a great many more cycles, in addition to supporting virtualization, executing meta data that links to the execution storage applications, locally and across the Wide Area. Quite possibly, one of Pat Gelsingers’ contributions to EMC, on top of his leadership abilities, will be making extensive and efficient use of processors for meta-data processing and handling.
Posted by: Doug Rainbolt | October 27, 2009 at 01:59 PM
@Doug
Thanks for the *very* insightful comment. And, for many years, the thinking around the importance of metadata in information handling has been first and foremost in our minds, reflected in some of our products -- but there's much more to do here.
And you're right, the advent of a new wave of cheap/plentiful compute will make us look at old problems in a new light.
Thanks for sharing.
-- Chuck
Posted by: Chuck Hollis | October 28, 2009 at 10:43 AM
I began to understand the importance of rich meta- data in describing content. It was so much more than simple file and directory names.
Posted by: Cheap Computers Canada | March 24, 2010 at 10:34 AM
The one thing I could see happening though is "cloud storage" technologies being shrink wrapped and sold to organizations who can use them to leverage their infrastructure to provide higher utilization of the gear that they have. Though I think that is still a ways off.
Posted by: Project Management Software | June 04, 2010 at 09:17 AM
Private clouds go a bit further, and assume you're doing this with virtualization (presumably VMware) in such a way that you can do this behind the firewall, using external service providers, or any dynamic combination of the two.
Posted by: IT Support Northridge | June 10, 2010 at 07:51 AM