Last weekend, two weather systems merged off the coast of New England, creating a blizzard of "historic proportions". Each weather system wasn't particularly important by itself; but somewhat disruptive when combined.
And it took a while to dig out :)
I believe a similar pattern is brewing in our IT world -- two trends are on a collision course; with the likely result being a significant disruption in the storage sector.
The first trend is, of course, the software-defined data center, and all its software-defined subcategories: server, network, storage, etc.
Anything that's virtualized has demonstrated it's far easier to consume: easier to acquire, provision, and manage -- and that includes storage.
The second trend is the insatiable demand for ever more storage capacity and associated functionality. Historically a market with healthy growth, it shows clear signs of ratcheting into an entirely new gear, fueled by demand for all sorts of big data applications.
Although this particular IT mega-storm isn't here yet, you can see what's likely to happen. And all of us are advised to think ahead.
It's More Than Marketing, It's Evolution
At one level, the discussion around software-defined data centers is nothing more than an extension of familiar server virtualization concepts to other domains.
Unlike other relatively new discussions (e.g. "cloud", "big data"), this one hasn't been met with the familiar skepticism by IT professionals.
They've seen for themselves what virtualization has done to servers, and they can see the obvious potential elsewhere in the IT infrastructure stack.
It's a simple proposition: the IT infrastructure components we typically think of as physical (storage, network, security, etc.) will soon be expressed as layered software, running in virtual machines, presumably on pools of industry-standard servers.
Cost savings result from the use of standard servers.
Resource efficiency results through the ability to pool resources.
Operational efficiency results through managing integrated software entities vs. isolated physical ones.
And agility results from being able to react quickly to new requirements.
It's a win-win-win-win. We've all seen how powerful this is with server virtualization, now we're just expanding the concept to other forms of infrastructure to create the virtual data center.
But what about storage? Isn't storage all about physical capacity with 1s and 0s?
How do you express that as software?
Unpacking Software-Defined Storage Concepts
If you take the view that a storage array is nothing more than layered functionality running on inudstry-standard hardware, the "software defined" aspect becomes more approachable.
Just as with the software-defined data center, all we're really doing is decomposing and recomposing familiar functionality into new, virtualized layers.
While there's no standard schema yet for discussing software-defined storage, I'll share how I'm describing the emerging "stack" with customers and partners. While there are understandably a lot of moving pieces, I've mashed everything into three layers to keep it relatively simple.
At the top of the stack is what I call the "consumption layer".
As a user, here are all the familiar data presentations (block, file, object, HDFS, etc.) that one might expect to see.
There are APIs that integrate my use of storage with my application: be it web-scale, big data or something more prosaic like an Oracle database.
As a user, I see a management environment that looks like I'm the only tenant, and it ideally integrates storage activities with my usual operational workflows: provisioning, utilization, performance monitoring, etc.
Because these "consumption portals" are virtualized, they're dynamic entities -- they can be created on-demand, on top of pooled resources, and flexed as needed -- much like we create virtual servers today.
Beneath that is a broad category of what I'll call "data services layer"
For the most part, these services are simply moving data from one place to another: snaps, replication, tiering, backup, caching, federation, etc. Add in a few things like deduplication, compression and encryption, and we're good.
Ideally, the data services are agnostic to the consumption layer: as an example, I might see what looks like "my" file systems, and am unaware of what might be happening under the covers.
Once again, because they're virtualized, these data services can be created on-demand, on top of pooled resources, and flexed as needed.
Underneath it all, there's the "data preservation layer" -- where the bits live.
The job here is conceptually simple: write data to media, and make sure it can be read back reliably at some point in the future. Just like there are different consumption models, and data services, there are many potentially different ways of doing data preservation, depending on what you're after: efficiency, scale, performance, redundancy, whether you're using disk or flash, and so on.
Once again, because this functionality is virtualized, a variety of data preservation approaches can potentially be used, ideally independent of the data services and consumption options selected above -- again, all running on a pool of commodity hardware.
A New Reconstruction Of Storage Functionality?
Today, the vast majority of storage is consumed as an integrated "stack", usually in the form of a physical storage array -- and there are plenty of examples.
Take any popular storage array (including all of those in the EMC portfolio) and you'll see the same three concepts expressed as fully-integrated storage stack, wrapped in optimized hardware.
Software-defined storage proposed an entirely different model for presenting essentially the same functionality: abstracted layers, running in virtual machines on a pool of shared servers.
Because the model is very different, so are the potential strengths and weaknesses.
So -- even though we don't have fully-featured software-defined storage stacks in the marketplace today, we still can take a look at how these two different approaches might potentially compare and contrast.
Physical Storage Arrays Will Be With Us For A Very Long Time
The established model of deploying storage functionality in physical arrays has more than a few fundamental advantages: there's a lot of functionality at hand, storage functionality and hardware are well-integrated, and optimized, and mature, and well supported, and familiar to all of us, etc.
Those sorts of structural advantages don't disappear overnight, do they?
That being said, I think there are going to be strong incentives to encourage people to take a close look at the new software-defined models for providing storage services in a variety of environments; some familiar, some relatively new.
#1 The Lure Of Commodity Hardware
Anyone who's been around storage is inevitably drawn to the much lower raw capacity costs associated with internal server drives vs. external storage arrays. Certainly, the advent of software-defined storage products will ideally create a new category of price points for certain use cases.
But software-on-server storage solutions have been around for a while, and they haven't exactly been widely popular as compared to the more traditional array-based approaches. I think I understand why this is, and how it can be overcome -- more on this later.
But it's important to note that there's more in play here than just the appeal of using low-cost servers for physical capacity. For one thing, that same pool of homogeneous server/storage resources can be used for other workloads than simply running storage software stacks -- driving efficiency and utilization up even further.
#2 A Storage Array Is A Server
Disassemble a storage array, take away the physical storage capacity, and you've essentially got a server: compute, memory and ports. Or take one of EMC's pure "data services" products (e.g. RecoverPoint, VPLEX, etc.) and you'll see the same thing.
Whether those server resources are running software performing data preservation functions, data management services or familiar presentations and consumption portals (or all three) you're essentially running software stacks on server hardware.
Today, when we're sizing physical storage products, we're essentially sizing a set of server resources to get the job done. But since these server resources aren't dynamic, we have to use a "worst case" methodology: we tend to size to have enough resources on hand for peak workloads and future growth.
The typical result is familiar -- in larger environments, there's a sizable amount of unused "server capacity" -- stranded in various physical storage devices -- that's essentially unused most of the time.
Software-defined storage holds out the tantalizing potential of changing how we think about this part of the equation. Have a demanding storage workload pop up -- I/O, replication, pNFS, etc.? Throw a bunch of virtualized resources at it. Has the storm subsided? Reallocate those server resources to other tasks.
Just like we do with server virtualization today.
#3 A New Provisioning Model?
The software-defined data center implies a new model for provisioning resources: one that's application centric. Here's my application, here's what it needs for resources and external services.
Applied to storage, software-define concepts substantially changes how we think about the model. A quick example? Application "A" requires large capacities, sequential bandwidth and minimal data protection. Application "B" is highly transactional, and requires advanced data protection. And application "C" wants a distributed object model with rich semantics.
Or, just to keep matters interesting, perhaps a newer application that's doing a bit of all three.
In today's storage world, you're probably talking three distinct physical storage stacks, each with its own unique management and operational model. There would inevitably be an explicit (and probably protracted) conversation between each application team and the infrastructure team about how best to meet these wildly different requirements.
In a software-defined storage world, it would be very different. You'd be working off a single pool of resources -- the same resource pool, in fact, as the application team is using. That resource pool would largely be managed in a consistent fashion, regardless of how it was being used.
Required storage services would be dynamically specified by the application team as an integral part of their workflow. No need for protracted negotiations between teams, no need to forecast and manage different physical storage resource pools, and so on.
The result? Vastly more efficient from both a resource and an operational perspective.
#4 A New Operational Model?
In today's silo infrastructure model, every team runs their own plumbing. The server team looks out for the server inventory, monitors its health, takes corrective action as needed, and so on. The same story is true for the network team, the storage team and perhaps others.
Every infrastructure team has their own distinct and separate puddle of resources to look after. It makes a certain sense, because they're all somewhat different.
But in a software-defined data center, it's all the exact same pool: server, storage, network, and more. You now have one way of managing the physical resources that everyone uses to get their job done -- better pooling, better efficiency of resources and processes, and much more agile.
#5 New Storage Functionality as an OVF
Storage requirements are changing all the time, and storage vendors work hard to create new forms of functionality and integration -- usually in the form of software -- to meet these needs.
Unfortunately, a lot of this storage software is tied to specific physical devices, which makes the whole process of customer evaluation, qualification, implementation and maintenance very cumbersome. For example, if you want to evaluate a particular vendor's array, sooner or later there's going to be a good-sized piece of hardware involved.
Software-defined storage takes almost all the hardware-associated friction out of this process. It's easier for vendors to create, test and ship new functionality; and it's far easier for customers to evaluate, qualify, implement and maintain the results. Costs are reduced and agility improves for both producer and consumer.
We've all seen it in the server world; now we have the potential of seeing the same thing in the storage world.
It Takes An Ecosystem
Many of you reading this may be wondering -- aren't we seeing some of this already today? More than a few "storage products" are available today as virtual machines (including a few from EMC) -- what's holding us back from doing more?
Through this lens, it's not so much that storage has to change, you start to realize that everything around storage has to change as well.
In a software-defined world, storage acquisition (for example) looks more like a server buy than a storage buy. Vendors have to get very proficient at supporting critical storage functionality when they don't necessarily "own" the underlying hardware. Application teams and packaged software vendors had a tough time accepting software-defined servers, I can guess how they'll feel about software-defined storage.
Diving deeper, while we're obviously decomposing functionality and re-assembling using a new model, there's still a strong need for end-to-end re-integration around delivering reliable and predictable storage services.
After all, it's your data we're talking about ..
The Potential For Disruption Is Great
Over the last ten years, server virtualization (or software-defined servers, if you will) have completely changed how we think about compute. It's gone from an interesting capability to the mainstay of IT infrastructure thinking.
Now those same concepts are starting to be re-applied to the storage world -- but this time, I think it will happen much faster.
For one thing, we've got a clear majority of IT professionals who are not only comfortable with virtualization concepts, but are also committed to a "virtualization first" policy. Software-defined storage is linear extension of their thinking, and fits in nicely with their world view.
More importantly, there's a much greater demand for better storage solutions using different models. Data is proliferating far faster than compute, presentation models are proliferating, data services options are proliferating, data preservation alternatives are proliferating -- meaning that more people will be interested in taking a serious look at these new software-defined models.
Bu the real potential for disruption is in the storage vendor community. We've all built our businesses on large, physical hunks of hardware, and -- in this world -- that's not really part of the equation any more, is it?
While I'm quite sure there will be plenty of demand for ever-improving physical storage devices, there will soon be a growing market of IT professionals who prefers the same functionality expressed as composed virtualized software services running on a pool of commodity hardware.
The next few years should be very interesting indeed.
Disclosure - EMCer here.
Chuck - this is indeed an interesting time, and VERY disruptive. As you said while change doesn't occur overnight - there are material architectural considerations to these alternate architectural models
This is one of the things I talked about in my "expect in 2013" blog post/webcast here: http://virtualgeek.typepad.com/virtual_geek/2012/12/my-crystal-ball-for-2013.html
I have to say - I'm glad that as a company, we're embracing this as opposed to the alternative "stick head in sand" approach :-)
Posted by: Chad Sakac | February 12, 2013 at 06:04 AM
What do you think of Facebook with a billion photos uploaded on New Year eve? I read somewhere they are adding 10 petabytes a month. I am sure, despite the huge challenges, this is only possible with the simplicity of storage service for only one giant application. Software that defines the storage is embedded in the application itself?
My own experience says that if you unify the application needs (for at least files), you can scale a lot more easily. But is has to be at the app layer. With SupportCentral at GE we aggregated storage for individuals, communities and applications - applications could use the "folder" storage just like a person. This aggregated the need and allowed for the simplicity that is possible when only one application uses the storage.
Adoption? Takes a long time. It is difficult to justify projects to make this change, to transfer existing data, all of which reduce the cost savings in the short term.
In any case, since I left GE two years ago, I hear that SupportCentral folders are now increasing at nearly a terabyte a day. The aggregation is happening - but it took many years.
(BTW - your blog on SupportCentral almost five years ago, was eerily prescient - to think that GE had Box.net, Sharepoint and a whole lot more almost ten years ago!)
Posted by: Sukh | February 20, 2013 at 04:27 PM
Hi Sukh -- good to hear from you again! You're right, SupportCentral was very much ahead of its time, wasn't it?
The debate around app-specific storage vs. general-purpose storage will continue over the next few years. It seems that the ostensible middle ground is object storage -- apps can morph storage semantics and behavior by incorporating metadata and defining policy rules that the storage layer can interpret.
While that might be an attractive answer, it seems that most app developers are inevitably stuck in either a file system or database paradigm. The tools are there, they've proven their value -- but they remain on the shelf so often.
Good to hear from you!
-- Chuck
Posted by: Chuck Hollis | February 21, 2013 at 09:06 AM
Well said Chuck. Innovation and Disruption being developed and released at an unprecedented pace. You have to love what's occurring in the industry.
Posted by: Vaughn Stewart | March 26, 2013 at 08:19 PM
That was a great article! I like to know how the controller from the storage array will be decoupled and added to the software layer.. Basically like to understand the new SDS architecture and how it will function.. Thanks..
Posted by: Mani | April 05, 2013 at 06:38 AM