Anytime a new concept enters the marketplace, the inevitable Definition Wars begin.
Understandably, it takes a while for the industry to collectively wrap its head around a consensus perspective regarding a new subject.
When it comes to one of my personal hot topics (software-defined storage), I do try to patient, but I fail. I despair as I encounter many of the flaccid definitions currently in vogue.
Maybe my standards are too high?
Making matters more irritating is the seemingly endless army of chirpy storage marketing types looking to slap a fresh label on familiar products and technologies — inserting even more noise into an already weak signal.
In this post, I’m going to resist the temptation to assert a black-and-white definition, and disclaim all others. The reality is that software-defined storage is a cluster of related concepts: some fundamental, others more optional depending on requirements.
Depending on your situation, good enough may be good enough.
As I walk through the list here, I’d encourage the motivated reader to assemble their own personal list of attributes he/she considers important in their emerging environment.
Essential: Programmability Of Behavior
If nothing else, software-defined storage should be capable of being entirely under external programmatic control.
That’s a long-winded way of saying that anything that can be done by a human can be done via an external program, presumably using an API.
This is the essence of software-defined: behavior is determined by external software control — typically an automation framework. You 'll find essentially the same concept in software-defined networks, and software-defined compute (e.g. server virtualization).
Without this one key attribute, there won’t be the basis for effective coordinated automation, and that’s one of the primary goals here. Even better if a single API framework could be used to control many flavors of storage vs. one per vendor.
Frustratingly, there are “software-defined” storage vendors today that offer little-to-no API capabilities. When pushed, they say their homegrown management tools are all anyone needs — and that’s software, isn’t it?
Make it past this initial (and essential!) hurdle, and we can start to differentiate better (and more useful) forms of software-defined storage.
Very Desirable: Dynamic Vs. Static Resource Model
The vast majority of storage products today use essentially static resource models underneath the covers. The storage administrator must determine — well in advance — what different applications might eventually need, then buy a bunch of stuff, then carve it into pre-provisioned resource pools with different service levels (capacity, performance, protection, etc.), and then advertise it for consumption.
You can see where problems arise.
If an application’s requirement doesn’t fit nicely with one of the pre-established storage service levels, that’s a problem. If an application’s requirements move up or down outside the range provided by the pre-defined storage resource pool, that’s a problem. If actual aggregate demand doesn’t line up with the buckets that have been forecasted, that’s a problem.
Far better if we can pool resources, and use them to dynamically create (and adjust) storage service levels as actual demand materializes.
Very Desirable: Storage Services Aligned On Application Boundaries
IT is all about applications, and that’s how users see the world: through the lens of their applications. Ideally, we’d be able to tailor and adjust services (including storage) along precise application boundaries.
More for this application, less for that one.
Today’s storage has scant knowledge of applications or their boundaries; it sees the world as LUNs and filesystems, each one of which offers up a combination of capacity, performance, protection, etc.
A better model would be one where storage containers are aligned to application containers, and storage services can be specified for a given container and no more.
Very Desirable: Automation Driven By Application Policy
We all know there’s a big difference between saying what you want, and having to explicitly spell out exactly what’s required to achieve a goal.
In today’s storage world, we typically have to be very explicit: use this much cache/flash, tier under these conditions, protect using a combination of RAID 5 and remote replication, etc.
Far better if we could specific storage requirements in application-relevant terms: things like targeted application response times, desired RPO and RTO, cost sensitivity, compliance requirements, and so on.
Better yet if that same application policy could serve as a manifest for compute, networking, security, etc. and not be unique to storage.
Desirable: Works With External Storage Arrays
There are many who see “software-defined storage” as synonymous with “only runs on commodity servers”.
That’s far too restrictive in the real world: storage arrays will be with us for a very long time indeed, and they represent the vast majority of storage spend these days.
Definitionally excluding them doesn’t make logical sense, nor is it a pragmatic view. To the extent a storage array can support the functions described here, it should be seen as full participant in any software-defined storage environment.
Desirable: Runs On Commodity Servers
That being said, there’s enormous interest in software-only storage products that use familiar servers to offer storage servers. It’s an important option (just like external storage arrays) that our ideal software-defined storage environment should support.
Perhaps now would be a good time to call out the distinction between software-based storage (runs on commodity servers), and software-defined storage.
As an example, there are more than a few software-based storage products on the market that don’t even begin to qualify as software-defined storage (e.g. no APIs!) using the guidelines presented here.
Ideally, software-defined storage would support both traditional external arrays as well as newer, software-only storage stacks — and use a consistent framework for both.
Somewhat Desirable: Works With Older Storage Arrays
You might find yourself arguing that this attribute should be essential, but let me defend myself: storage gear has typically had a useful life of 3-5 years, meaning that in larger shops there’s a conveyer belt of new stuff coming in, and older stuff going out.
In the majority of cases, the stuff you have sitting on the floor wasn’t designed to participate as part of a software-defined storage environment — at least, not without an additional storage abstraction layer.
Somewhat Desirable: Supports Existing Workflows
A big benefit of software-defined storage is that it can support converged operational workflows: where all resources are managed by an application-centric policy. That’s the big win with software-defined data centers.
While all that sounds great, the reality is that most of the storage world today runs on manual workflows and tools that have been around for a while. During the adoption period for SDS, it’s somewhat advantageous to support some of the existing processes while experience is being gained with the newer ones.
That being said, there seems to be no shortage of greenfield environments being stood up with clear breaks from legacy approaches, hence me believing this attribute is somewhat optional.
The VMware Perspective
For those of you who know my situation, you might think I’m simply reciting the VMware party line when it comes to software-defined storage: VSAN, vVols, SPBM, vCAC et. al.
I guess I’m fortunate that my personal perspective and my employer’s perspective overlap. That’s the big reason I work where I do.
In my eyes, VMware “gets” software-defined: for compute, for networking — and now for storage.
I’d love nothing more than to spill the beans on what’s being worked on over the next few years. We’re not just talking about incrementally better tech; we’re talking about entirely new models.
All I can say is that I’ll have plenty to blog about for years to come.
If You’ve Made It This Far ...
... you’re probably intrigued with this stuff, just as I am.
Now is a good time to create your own informed opinion on what software-defined storage might mean to you and your organization. At this point, though, you’ll have to be good at filtering noise, and amplifying the important signals.
There's some history here that's worth considering. As we all know, what it meant to be a “server expert” changed dramatically when virtualization became prevalent.
And it’s safe to say the same thing is going to happen in storage.
----------------------
Like this post? Why not subscribe via email?
Simply a fantastic post - well said Chuck
-v
Posted by: Vaughn Stewart | June 11, 2014 at 09:59 AM
Sorry, Chuck, with all due respect, but software-defined means exactly what the word says. Standard, virgin, nameless, undefined hardware stem cells that find their eventual groove by means of the genetic material - the software - expressing its functionality. The reason why users are so enthusiastic is because it promises to finally bring back the generic simplicity that got lost in the Stovepipe Era. The proposed use of legacy storage arrays and other pointless complications are simply antithetical to that value proposition and will ruin rather than rescue the desired quantum improvements in TCO.
Posted by: Paul Carpentier | June 12, 2014 at 09:28 AM
Hi Paul -- good to hear from you again.
We both know that storage arrays are an extremely popular form-factor these days, accounting for the vast majority of storage spend. Any definition that categorically excludes them on simplistic philosophical terms is naive and impractical.
Digging deeper, almost all arrays use commodity hardware these days, and are differentiated by their software stacks "expressing functionality" as you put it. So, on the face of it, you're discriminating against one class of storage stack vs. another.
Although in a world where the same storage stack is available either (a) packaged with hardware, or (b) running nicely in a virtual machine, it's not clear to me what your argument would look like.
As an example, if someone decides to offer your Caringo storage software pre-packaged with hardware and supported as a unit, does that make your product "bad"?
-- Chuck
Posted by: Chuck Hollis | June 12, 2014 at 01:40 PM
Chuck - Great post and lots to think about.
IMO - in addition to being dynamic, one of the key characteristics of software defined storage is when do you bind the application policies to the underlying storage - when the application starts or every time the application processes a data set? For example, when MS Exchange Email starts and all storage that belongs to MS Exchange has the same policy bound to it, or policy binding on the underlying storage happens when MS Exchange processes CEO email vs. a spam email? This leads to the question is the policy related to the application or is it related to the application data? Also, about the definition of dynamic - it means different policy can be assigned to the same storage over time, right? For example - CEO's email from 7 years ago may not have the same policy as his/her email 7 hours ago?
Some things to think about as we create the software defined storage infrastructure.
Posted by: Suresh Jasrasaria | June 18, 2014 at 10:05 PM
You bring up a good question.
In my book, the application is responsible for knowing what it wants, in this case Exchange. If Exchange is smart enough to request differentiated storage classes (e.g. one for current email, one for spam, one for retention, etc.) the infrastructure should fulfill those requests. More sophisticated enterprise apps usually have multiple storage spaces, and make decisions about what goes where.
I've had lengthy experiences with the other approach, which is for IT administrators to attempt to impose "application knowledge" externally. It's messy, messy stuff.
Posted by: Chuck Hollis | June 19, 2014 at 01:07 PM