One of the more popular questions that gets directed at me by journalists and others these days is around the above topic.
I guess since EMC does storage -- and is very active in things cloud-like -- they expect us to have some nice sound bites.
Well, I have my pre-packaged answer that's suitable for the press, but -- if you have the time and the interest -- the deeper answer is much more engaging.
Let's Get Started
Since there are so many definitions of cloud floating around in the sky these days, it's probably helpful to frame what I mean when I say "cloud"
- aggregated and abstracted resource pools of compute, network and storage
- a dynamic consumption model where applications acquire and release resources as their needs change
- an implied oversubscription model that presumes uncorrelated resource demands
- some notion of geographical distribution across time zones
- an optimized operational model that is designed around service delivery, rather than traditional resource pools and processes.
Your definition may be different -- but, since it's my blog -- I thought I'd start with mine!
Is It All About Capacity?
Go to any industry cloud discussion, and the storage conversation will inevitably shift to "cheap" and "big".
Web 2.0 companies (e.g. Facebook) are frequently used as the prime example.
If you're a web 2.0 company, "cheap" is really important, since so many web 2.0 companies have business models that demand very inexpensive storage.
And "big" is important since there are many examples of web 2.0 companies who have gotten surprisingly big, and thus need extremely large storage farms.
I do have to point out that a surprisingly small fraction of IT spend goes to the newer web 2.0 companies compared to the more boring (but much larger!) traditional enterprise IT consumers, so using these folks as an example can lead you astray in some cases.
However, if you're close to storage technologies, you'll realize that cheap is really a function of service level delivered.
Take any disk drive and make it bigger -- it'll become cheaper -- and slower.
Spin it down -- it'll become even cheaper -- and even slower.
Start deduplicating or compressing data on it -- again, even cheaper and even slower.
Go from disk to tape -- potentially even cheaper and slower again!
Conversely, make multiple copies on multiple disk drives, and the picture reverses -- things get faster, more available -- and more expensive.
Or Is It About Service Levels?
I would argue that cheap storage is relatively easy -- and big storage is relatively easy -- but what is *not* easy is getting the right data at the right service level at the right time.
Indeed, at one of the panels I attended at the recent GigaOm conference, a few of the web 2.0 IT architects pointed to solid state disk as the "single most important technology" to them going forward.
You may be surprised by this statement -- I'm not.
Now, start throwing global network latencies into the picture (also a determinate of perceived service levels, as well as cost), and you get a much more interesting picture, don't you?
Extending The Definition of Storage and Clouds
If you go back to the definition I outlined above, we can get even more precise:
- Aggregated and abstracted resource pools of compute, network and storage
This implies that not only storage capacity is aggregated and pooled, but storage bandwidth and response times are also aggregated and pooled.
It also implies that storage is conveniently abstracted, ideally in such a way that complements the abstraction models being used by servers and networks. Hint: think virtual machines.
- A dynamic consumption model where applications acquire and release resources as their needs change
Well, we know that when it comes to storage capacity, the meter goes in only one direction -- more storage.
But when it comes to storage performance (and perhaps availability) a different picture emerges.
It implies the ability to have pools of information go from very slow/cheap to very fast/expensive (and back again) dynamically. Hint: think technologies such as FAST.
- An implied oversubscription model that presumes uncorrelated resource demands
This sort of performance profile changes the way you think about storage array design, as well as storage network design.
If you think about it, this is a very different aggregate performance profile for storage, isn't it? Traditional measurements and benchmarks (think about our old friend the SPC for example) are utterly useless in this world.
- Some notion of geographical distribution across time zones
Having the right information in the right place at the right time dramatically improves end user application performance and can dramatically reduce associated network costs.
Indeed, you've seen a healthy dose of that thinking with the current EMC Atmos product. It solves an interesting use case for geographically distributed storage models, but not every use case -- which implies that you'll probably see more along these lines from EMC and other vendors before too long.
- An optimized operational model that is designed around service delivery, rather than traditional resource pools and processes.
If you parse this statement, you'll realize the implication is that storage in the cloud isn't managed the way we traditional manage storage today -- it becomes simply an extension of the service being delivered.
Those of you who are doing long-term career planning as storage architects and administrators might take note of this thought.
Should We Be Talking About Cloud Storage vs. Cloud Compute?
I do have to share one of my personal biases -- the whole category of 'cloud computing' is probably misguided, at least in my book.
Computing in the cloud seems relatively straightforward. Lots of different ways to do it. My belief is that private clouds -- based on virtualization -- will be the dominant model for most enterprise IT shops.
However, getting the right information to that application, in the right location, at the right cost, at the right service level, at the right protection level, while keeping everything secured -- well, that just seems so much more challenging, doesn't it?
We'll see where the discussion goes in the future, won't we?
I like the distinction between cloud compute and storage. Storage has been far more virtual for years as its process cycles are more conducive to aggregation than instruction sets in micro-architectures.
Do I hear you saying web 2.0 is a small part of the market EMC finds less interesting because of their requirements?
Posted by: James | June 26, 2009 at 08:18 PM
Excellent post, Chuck. I like how you've broken down the broad category of cloud storage with some specific, testable axioms.
One small point of clarification. I often hear a division of the storage world into "Web 2.0" and "enterprise." The third group often left out is software-as-a-service (SaaS) providers.
Companies like our's use storage and are quite distinct from Web 2.0 applications or internal enterprise needs. And while we don't give away our services for free like Web 2.0 companies, we do have price-competitive businesses and gross margin pressures. As such, cost is important to us as well.
In any case, it might be worth considering the needs of this category in your future analysis.
Posted by: Nick Mehta | June 29, 2009 at 02:09 AM
Agree on the distinction between cloud compute and cloud storage. Seems competition is cohering along these lines.
"Having the right information in the right place at the right time dramatically improves end user application performance and can dramatically reduce associated network costs."
This makes a lot of sense. Will Google File System (GFS) characteristics will be more prevalent in this space and is objects per second (OPS) a more useful measurement than say IOPs? To wit:
http://wikibon.org/wiki/v/DataDirect_Networks_aims_at_cloud_storage
Posted by: Dave | June 29, 2009 at 06:53 AM
Agree to your last point on security. Infact in one of the forums UK Sun CTO Wayne Horkan was quoted as saying that id cloud computing becomes a utility, it was important that the UK as a nation state had good security of supply. There will cost issues as well.
Posted by: sanjeev | July 01, 2009 at 09:13 AM