At one time, almost all storage was directly attached to servers. Whether that was through disk drives in the server enclosure itself, or external racks directly connected, there was a clear 1:1 mapping between server, application and dedicated storage resources.
Over the last twenty years, the pendulum has swung in the opposite direction: intelligent external storage arrays that share resources across multiple applications and servers. I can still remember back in 1995 trying to convince people that shared storage was better than the dedicated kind, and it certainly wasn't easy :)
But now, there are clear signs that the pendulum has started to move back: intelligent shared storage, but without the familiar external storage array.
And I've now been asked more than once -- are servers the new storage?
The idea of using familiar, commodity-based servers to provide shared storage services has repeatedly become popular -- and then faded -- but it's not the same this time -- the motivations are very different.
Why Would You Want To Use Servers As Storage?
The idea of using standard industry servers as the basis of a storage array isn't exactly new.
Take any tour of the storage array marketplace, and you'll find many designs that are achingly familiar: two or more server nodes with low-latency connections and intelligent software that turns commodity hardware into a useful storage array.
For example, at the recent VMworld there were perhaps 20 different storage array vendors, and my informal survey had most offerings using this approach.
But it's important to note that the model hasn't changed: storage array vendors still sell and support a box, it's just made of familiar components.
What about an entirely different model -- one where customers source their own hardware, and simply install software to transform servers into shared storage?
I think there are four powerful forces that virtually guarantee we'll see much more of this before long.
The Performance Argument
Storage performance these days boils down to flash -- pure and simple. You pay the extra money to get great gobs of speed.
Put a small amount of flash in an external hybrid array, and it's a significant performance boost that makes all applications run faster -- depending on the effectiveness of the algorithms being used. Go farther and create an all-flash external array, and you get predictably stellar performance for all applications that use it.
But put that same flash on a server bus, and it gets much faster and cheaper than any external array. Less latency, no need for a dedicated enclosure. Move that same flash technology (or its successor, like PCM) to the motherboard, and it gets even faster and cheaper.
Flash wants to be as close to the CPU as possible, for all the right reasons. Like many, I see Cisco's recent acquisition of Whiptail through this lens.
Storage software is needed to make all of this work, of course. Almost all of the software solutions today use server flash for cache, and don't try and deal with the complexities associated with persistent, resilient shared storage. But that will change before long.
Bottom line: flash economics will strongly favor a server-resident approach over time. And that means that technology decisions and implementations will be mostly owned by server teams, and not storage teams.
The Convergence Argument
When well-established technology categories collapse into a single entity, we call that convergence. Convergence is generally a good thing for IT customers: fewer moving pieces, integrated workflows, etc.
As an example, consider VMware's VSAN, now in beta.
The hardware is most certainly converged -- there's no need for dedicated storage hardware or resources, it's the exact same stuff you use for compute. But -- more importantly -- the operational model is converged as well. Storage is now a simple extension of what VMware administrators do day-in and day-out. No real need for a dedicated storage team.
The Pooling Argument
Today, we generally have one pool of resources for compute, and a separate pool of resources for storage. What if they were the same pool? The bigger the pool, the better you can do at optimizing resources. Look inside all those dedicated storage arrays, and you'll usually find a boatload of dedicated compute, memory and network ports.
The ability to consider both server resources and storage resources as one, shared compatible pool -- well, that's an attractive proposition from both an economic and operational perspective.
The Simplicity Argument
Anyone who's worked in a multi-vendor environment realizes that -- from an operational perspective -- the less diversity, the better.
Being able to use the same set of building blocks for both your server farm as well as your storage farm -- well, all things being equal -- that's a win. Fewer vendors to deal with, greater buying power, standardized tools and operational processes, etc.
The Counter Arguments
Where this line of thinking tends to break down is when considering very large amounts of storage.
Yes, flash is getting cheaper, but it will never be as inexpensive as rotating disks, no matter what those startups are telling you :)
Data volumes in the data center continually to outstrip media density advances -- even with efficiency techniques like dedupe -- so it's a safe bet that we'll see many more spinning disks in the future.
Simply put, standard server designs aren't optimized to house very large quantities of spinning disks. They're usually designed for efficient compute, with internal storage as a far lower priority. The density isn't there, the RAS isn't there, the power and cooling isn't there, and so on. As a result, they can be very, very inefficient at scale.
As evidence, consider the latest wave of new mid-tier storage arrays, where it's commonplace to rack 1000 or more drives behind a pair of storage controllers, all in a maximally dense enclosure. Try doing that using commodity servers for storage, and you'll usually end up with a sub-optimal solution.
It seems pretty clear to me that -- at scale -- we'll see future storage architectures fall into two, distinct categories, each optimized for purpose.
For performance, we'll see server-resident flash, augmented by low-latency interconnects and smart storage software. That's where you're going to get your best bang-for-the-buck with IOPS. For capacity, we'll see vast, scale-out farms of purpose-built storage controllers (albeit with familiar components) that deliver amazing capacity and bandwidth as needed.
Presumably, the two domains would closely communicate: moving data back and forth using a variety of services and semantics.
Are Servers The New Storage?
At decent scale, the answer appears to be "yes", especially for the performance tier, driven by the need for cost-effective flash very close to the application.
At more modest scale, though, the answer is more "I'm optimistic". The arguments are powerful, but you never know for sure until potential customers check in. Personally, I'm watching to see how people react to the VSAN beta -- as it puts into play all of the core arguments I listed above.
One thing is for sure -- there will always be more data to deal with, which means we'll always be looking for better ways to deal with it.
--------------------------
Like this post? Why not subscribe via email?
Chuck, as from the first time i met you when you convinced 100's at EMC that selling Enterprise Storage made sense for customers, you have always been ahead of your time every since.
Server Flash from Virident used by VMW and EMC helps apps sing.
Posted by: Ken Grohe | September 18, 2013 at 01:28 AM
I'm in the camp of agreement. At the VMUG in Houston yesterday was my first exposure to SimpliVity, which has fully adopted this as the path forward (though I didn't get details on scale limitations). EMC's own Isilon has pushed this case with being able to run HDFS directly on the what was historically just a storage platform. And Avamar takes the same approach for a purpose-built-backup-appliance that also runs all the software of the backup application.
These architectures seem to provide a simpler management model, but also more redundancy in being able to say N+1, N+2, N+3... etc, "how many failures do you want to protect against?" type of options that a dual-controller array just can't offer.
Posted by: Nick Ryan | September 20, 2013 at 10:33 AM
I seem to have missed something - the point of external storage has always been the need for high availability which was typically enacted through the clustering of compute nodes and these have always required external storage for this to happen. Storage arrays took over from DAS (in the mainstream) when cost-effective arrays became a viable option to buying/supporting/hosting the 10s or even 100s of DAS JBODs that a single array could replace.
Admittedly, VMware VSAN appears to address this, however it will need to demonstrate its robustness and performance before the market moves from external storage; and judging by the problems I have seen with the lower-tier offerings such as VSA, that might still be in the distance.
Posted by: Gregski | October 22, 2013 at 10:44 PM