As EFDs (enterprise flash drives) find their way into more and more customer environments, we're learning that "conventional wisdom" has to change in a few subtle -- but important -- regards.
Old wisdom -- hot spots are bad!
New wisdom -- hot spots are good!
What Brought This On?
I saw the press release today from Oracle Open World about EMC partnering with Oracle on the use of EFDs to make databases run faster.
Like most press releases, I thought it left most of the juicy bits out, so maybe I can fill in a few pieces?
The positive results of customers using EFDs in production environments keep rolling in. Every time I read (or hear) some of the ridiculous FUD that's being passed around, I just have to smile a bit ... because the results are so consistently spectacular.
If you've got a highly visible application that does significant I/O, we should talk.
Especially if it's one of those highly visible, someone-would-get-a-promotion-if-it-ran-twice-as-fast kind of applications :-)
Oracle and Hot Spots
Most database architects I've talked to treat hot spots -- a region of the database saturated with I/Os -- as something to avoid if at all possible.
In a world where your application is only as fast as a spinning disk's ability to respond to I/O requests, you go to great lengths to evenly spread out the I/O load across as many spindles as you can reasonably ask for.
Transaction logs are the most obvious culprit, but in truly large applications, much thought is given to how tables and indexes are organized and configured, simply to avoid this problem.
Indeed, many people believe that the RAID 5 vs. RAID 1 discussion is all about availability.
Not entirely -- a given usable capacity on RAID 1 will usually offer higher levels of I/O performance, simply because it has more spindles, and in good mirroring implementations, either disk can serve a read request.
With EFDs, Hot Spots Are Good Things, Not Bad Things
Enterprise flash drives are scary fast. They're also very expensive when compared to, say, a 146GB 15K FC drive.
So the trick is to use as little EFD as possible to gain the greatest possible effect.
And we're finding -- surprisingly -- the hotter the hot spot, the bigger the benefit with EFDs.
Hot spots are nothing more than I/Os queued up against a storage device. EFDs can handle 30x or more IOPs -- at a far lower response time -- than any disk.
So, the goal ends up being "throw as many IOPs as you can at these suckers, because that's what they're good at". And every IOP gets returned in a fraction of the time of the fastest disk drive. And every IOP sent to an EFD is one that isn't being sent to a far slower spinning disk.
Indeed, it's starting to look like it might make sense to re-jigger your application to produce *more* hot spots -- simply because doing so creates clear targets for I/O acceleration. The tighter and more extreme your hot spot profile is, the less amount of EFD is required to make things really, really smoke.
But it's counterintuitive to most database types. And I think it's going to take a while before the thinking comes around.
"No, you WANT hot spots. Yes, they're GOOD things. Yes, we want you to change your database design to PRODUCE hot spots, not avoid them, because we're going to point them at EFDs, and your application will rock! The hotter the better!".
Boy, that's gonna be a hard conversation to have with people.
The Storage Guys Have To Rethink As Well
Most garden-variety storage design runs along the same lines of thinking. Hot spots are bad things. Take great pains (striping, etc.) to avoid them.
Well, with EFDs, the logic is now inverted.
If you, as a storage administrator, can figure out how to set things up such that the majority of your I/O load shows up in as small a capacity as possible, you win! -- you can buy a very small amount of EFD, and make everything on the array run much, much faster.
Unless, of course, you've got one of those spindle-randomizing arrays.
What's A Spindle Randomizer?
Several storage arrays on the market are designed ground-up on the principle of spreading the I/O load across as many spindles as possible.
On traditional arrays (e.g. DMX and CX), we do this with various forms of striping on subsets of the capacity, but some vendors have gone the extra step and made it an architectural fundamental of their array design.
Put simply, you can't turn it off.
So, when we consider arrays like the HP EVA and NetApp FAS and XIV and a few others, the assumption is that "good" is randomizing I/O loads automatically across as many spindles as possible.
You know, avoid hot spots ... :-)
Now, prior to EFDs, we could argue back and forth as to whether or not that was a good choice or not.
I tend to view with suspicion any advanced feature that can't be turned off. And I am a strong believer in retaining the ability to precisely place certain kinds of data in certain places -- if you should have the need.
But in a world where EFDs are very real, their performance advantages are very real, and so on -- well, folks, that just looks like a really unfortunate architectural choice to me.
Indeed, if you followed the debate surrounding my recent capacity posts, several HP EVA types described the box as a "bucket of IOPs" that you just carved pieces out of.
Well, that sort of approach completely defeats the benefits of EFDs, unless -- of course -- you're planning to buy a whole pile of them. Yeah, right.
Indeed, in the past I've seen all sorts of benchmark tomfoolery with the spindle randomizers -- typically, they'll take a very large amount of capacity, configure a small amount to be used for the benchmark (SPC-1 comes to mind!), and get *amazing* results when all the spindles are engaged.
EMC could play that game, you know, jam a whole bunch of EFDs into a storage device and claim something ridiculous like a million IOPS knowing full well no one would ever do such a thing.
No, wait, that's already been done :-)
How Does Performance Optimization Change In The New World Of Flash?
Ideally, you could take a 3+1 RAID group of 73GB EFDs, and carve them into all sorts of small LUNs.
Take this tiny sub-LUN and point it at an Oracle redo log. Take that tiny sub-LUN and point it at Exchange logs. Take this other sub-LUN and point it at very hot indexes.
And so on. A small amount of EFD carved into small pieces can make many, many applications on a single array run much faster -- if and only if you have great control as to where IOs land.
Good luck trying this approach with one of the spindle-randomizing arrays. If someone figures it out, let me know.
I can't see how it'd be done.
Is It Any Surprise?
The folks at these other storage companies aren't dummies. They've figured out what EFDs can bring to the game, and are trying to figure out what to do about it. Especially since EMC is offering them successfully today.
Those storage vendors with straight-ahead architectures (e.g. Hitachi, et. al.) can simply view EFDs as another kind of disk drive, albeit a very special kind of disk drive. In the long term, they'll be OK with this industry transition.
But those vendors who got very clever with trying to shuffle data around between disks automagically now have a world-class engineering headache on their hands -- these designs were conceived and built before anyone considered that -- someday -- there'd be EFDs, and customers would want them.
Which is exactly the case today, as I see it. Throwing FUD around only lasts so long before the reality gets out.
I guess they'll have to make it all about something other than performance, or significantly alter their designs to create I/O isolation capabilities.
Or maybe wait a few years until the price of EFDs come way down.
Either way, it should be interesting to see how it all plays out ...
Courteous comments welcome as always!

@Chuck:Good luck trying this approach with one of the spindle-randomizing arrays. If someone figures it out, let me know.
If I understand it right, one way is to maintain a list of most frequently used blocks either in the file systems or in the block layer within the operating systems. Once the access count for the blocks in this list crosses a threshold, move or re-map them into the EFDs.
The point is, dont let the blocks that constitute the hot-spots, reach the spindle-randomizers at all.
Posted by: Shehjar Tikoo | September 22, 2008 at 11:39 PM
You're right -- something like that will be needed. And it will only be as good as the algorithmic assumptions the designers make ...
Thanks for writing!
Posted by: Chuck Hollis | September 22, 2008 at 11:45 PM
Chuck, I was working for ConvergeNet (remember them?) about 10 years ago and they had the notion of using large amounts of flash memory as exported luns. It didn't work then for a number of reasons. One of the things I liked about the design was the possibility of "pinning" certain data loads to flash memory.
The thing I didn't like was the lack of management tools to help customers re-use the expensive flash resources for different applications at different times of the day or week or month.
Is there anything in the EFDs today that provide this type of device migration? This is one of the advantages of using cache memory - because it is usable by all applications, not just a couple.
Posted by: marc farley | September 23, 2008 at 03:07 PM
Hi Marc
We achieve that to a certain degree with products like SymmOptimizer that can identify hotspots in workload patterns, and transparently rearrange things as needed.
But I think much, much more is possible ... so stay tuned!
Posted by: Chuck Hollis | September 23, 2008 at 11:51 PM
Hey Chuck. Here's a macro question for you: with demise of several large institutions on Wall Street that were the primary targets for the EFDs in the Symmetrix array, are you afraid that market adoption may have hit a wall here until the economy turns?
Posted by: flashguy | September 25, 2008 at 08:25 AM
Interesting and valid question!
I guess I'd be tempted to answer "not really" -- there are still plenty of business models out there (financial and otherwise) where speed equals money.
The trick is finding the person who can connect the dots, e.g. "gee, this application over here makes us a bunch of money, if it ran 2x-3x as fast, we'd make more money".
And the larger the organization is, the harder this seems to do.
-- Chuck
Posted by: Chuck Hollis | September 25, 2008 at 11:24 AM