Big news today for IT infrastructure architects -- and the businesses that depend on them.
Today's "Lightning Strikes" announcement from EMC represents a new milestone in giving our customers a new-found ability to deploy advanced infrastructure that's more efficient, more optimized -- and a helluva lot faster :)
It's a big deal. There's a lot to cover, so I'm breaking this into three posts.
In this post, I'm going to set context -- what the problem is, what server-side flash brings to the party, what challenges exist with existing approaches, and so on.
In the next post, I'm going to dive into the details of VFCache (formerly known as Project Lightning): what it does, how it's different, what's there now, what's down the road, etc.
And, in my final post, I'm going to introduce you to Project Thunder -- which *hasn't* been as widely talked about -- but, as you'll see, is a natural extension of what Project Lightning (now VFCache) is targeting -- only at scale :)
If you're into IT infrastructure at any level, go get a cup of coffee, and let's dig in together. Yes, there's a lot to digest, but you'll probably end up just as impressed as we all have been internally for a while now.
To Start With
If you're into computing hardware, you know about the perpetual widening gap between CPU performance and storage performance.
CPUs (as best exemplified by Intel) follow Moore's Law. Every 18 months or so, we all get a big, healthy boost of CPU performance for roughly what the previous technology cost.
It's one of the best deals in tech :)
Not so when it comes to traditional storage performance. IOPS/spindle (the primary measure of rotating disk performance) has been largely limited to discouragingly modest performance increases over that same time period.
To put it mildly, this technology gap has always frustrated our customers. Buying faster servers won't necessarily deliver faster performance if reading or writing data becomes the bottleneck.
There are two historical approaches to delivering more storage performance. The first involves using lots of spindles and spreading the workload. But this can be expensive from a wasted capacity and increased footprint perspective. The other has been to use large non-volatile DRAM caches and smart algorithms, such as you'll find in the VMAX. Better, but still not ideal.
The technology answer to this -- as most people know -- lies in adapting flash memory to enterprise storage requirements.
While not entirely a new idea, its application is evolving fast -- as we'll see in a moment.
EMC Gets Onboard Early
We started taking an extreme interest in enterprise storage flash back in 2006. This resulted in EMC shipping the first enterprise storage technology (as part of the VMAX) in early 2008.
While it was a game-changer in its own right, we ended up having to do an inordinate amount of missionary work -- made even more difficult by the extreme FUD on behalf of all the storage vendors who were caught flat-footed by this industry shift. At the time, we described enterprise flash as a new performance tier, which it was.
If I played back for you the silly, dangerous and ignorant things that were being said about the technology at the time, you'd probably laugh -- until you cried. While that's past history, it does illustrate what usually happens when a major vendor attempts to advance a significant technology considerably ahead of its peer group :)
The 2008 technology was quickly followed in 2009 with Fully Automated Storage Tiering (or FAST) which provided the important capability to automatically tier information between high-performance enterprise flash storage and more traditional rotating rust. First on the VMAX, and then on the CLARiiON, the game had advanced significantly.
In 2010 VNX went even further by introducing FAST Cache, which used the technology as a big hunk of non-volatile storage cache that (unlike competitive alternatives) could be used for both reads and writes, as -- obviously -- most everyone does both :)
The Magnifying Effect Of FAST
Since EMC's implementations basically allow a modest amount of enterprise flash storage to positively impact a far larger pool, we tend to keep track of two numbers. The first, ostensibly, is the total amount of enterprise flash we've shipped.
This slide shows a current figure of ~24 petabytes (24,000 terabytes) which is no small number.
The more interesting number is the leverage effect: that amount of enterprise flash is dramatically improving 1.3 exabytes, or 1.3 million terabytes of customer capacity. A little quick division shows an average "multiplier effect" of 54x -- on average, for every terabyte of flash storage you purchase, you'll be cranking the performance of ~54 terabytes.
Or, as we've long said, a little bit of flash can make a huge difference :)
And Considerable Contextual Intellectual Property
Often under-appreciated is the considerable amount of organizational expertise we've built up around applying this technology to customer environments.
For example, we've modeled tens of billions of I/Os from actual customer application traces. We've taken the technology into many, many thousands of customer environments and documented the results.
The net result is that we've become *extremely precise* in quantifying specific customer benefits around their actual use cases. Not to point out the obvious, but that sort of intellectual capital just isn't available to smaller vendors, those that are bound to an antiquated architecture, or larger vendors that treat storage as an interesting side business.
But There's More To Do ...
Despite all the success we've had, we're quite aware that all of our efforts -- up to this point -- have been "in-the-box". Yes, we've made classic storage arrays faster, more efficient, more easy-to-use, etc.
And to take that next step, we have to redraw the "storage domain boundary" to include the server. Some outside observers have (mistakenly) thought that would be a culturally hard thing for EMC to do, but -- then again -- they don't work here, do they? :)
The motivation for server-side flash storage is clear and compelling -- it's a heck of a lot faster than even the very best storage-side flash.
Although the storage media technologies are somewhat similar, the real reason for the performance bump is that you're eliminating the latency (and occasionally the bandwidth constraints) of the "round trip" from server to storage and back again. While there's still a protocol handshake, it now happens at internal PCIe bus speeds vs. I/O channel speeds.
So, why doesn't everyone go out and buy server-side flash cards from their favorite PCIe vendor? Well, more than a few people have gone down that road, but have encountered not-unexpected problems.
First, we're usually talking valuable data here. That means it has to be protected, backed-up, replicated either locally or remotely, etc. The data might be moving faster, but requirements around data protection haven't changed.
Second, as we've learned from our past, the important secret sauce is to get the right data in the right place at the right time -- without it being a manual process. Smart software is needed to do this. We did that with FAST within the array; we now need to do the same between server and storage.
Third, like any storage, flash storage is expensive, and you'd like to use as little of it as possible by deduping, pooling, etc. as you would any other scarce resource.
Finally, when we "cross the chasm" into meaningful enterprise applications, it's not enough to put raw technology out there and encourage people to give it a go.
Enterprise customers expect their vendors to assess their environments, propose complete solutions, quantify the expected benefits, be able to certify a wide range of hardware and software environments, provide deep integration with the other parts of their environment, and -- above all -- be there in full force when there's a problem.
None of this should come as a surprise to anyone :)
The Stage Is Now Set
Thanks for your patience -- it's now time to dig into the specific details around VFCache -- what is it, what does it do, and how it's significantly different that what's already out there.
I'd invite you to join me on the next chapter of the story ...
Whether the hardware is LSI or Fusion, how is this materially different from FB's flashcache work (for the generic case, or 11G-R2 for the app-centric case) that's been available for some time now, without paying "enterprise storage" margins? Other than inconclusive summations with condesention and emoticons, where's the value and differentiation?
Posted by: mahargk | February 09, 2012 at 12:20 AM
Rather than repeat my entire post as an answer, I'd encourage you to read the material here.
Thanks!
Posted by: Chuck Hollis | February 09, 2012 at 06:18 AM
Chuck, sure thing - a PCI flash card sitting on a bus directly across from a processor will allow for faster completion of IO via a serial link (FC, GE, whatever other link) ... if there exists applications that issue IOs that can saturate serial links. In the past, such applications did not exist (just because of concepts such as READ-READ and READ-WRITE barriers).
16 Gbps serial links can handle IO rates of 2 Million IOPS (at 1000 byte IO size which is a rate large IO size for OLTP). H'mm, I cannot think of any "application" that can spew out 2 Million IOPs. And if this data point is valid, then the serial link argument being the choke point is invalid!
Flash cards sitting on a PCI bus is DAS architecture. And there were a lot of problems with DAS architecture. I realize that EMC's VFCache spin is not exactly DAS (like FusionIO) but by way of the Flash card, you have introduced one more moving part and one more management point.
One of the arguments of array based SAN attached storage is that it is centralized! You have one management point and you can deliver a rich set of storage services from this centralized management point.
Another argument I heard in the past was that one array port had more than enough performance capacity so as to be able to provision between 8 and 16 hosts (initiators) to access LUNs via 1 array port (fan in). Are we now saying that array ports can be saturated by ONE server?
Whatever happened to the argument that battery backed RAM caches in Storage Array's are capable of delivering the highest of performance needed by any server?
The newer pitch that servers are able to generate IO at much faster rates than an array port are capable of servicing is a contradiction of previous arguments.
Any clarification regarding these seemingly contradictory arguments will help reconcile the confusion.
Posted by: Roy DCruz | February 25, 2012 at 01:33 AM