I always warn people -- be careful of what you ask for, you just might get it.
For years, various competitors have heckled EMC that we ought to be more active and participatory with regards to public storage benchmarks.
Today, our Isilon team has released some truly astounding SPEC numbers that should get you thinking about this whole scale-out performance thing in a new way.
There's a lot of juicy insights here -- in addition to the predictable bragging rights -- so let's dig in and see what we've got.
Benchmarks Can Be A Mixed Bag
My real beef with public storage benchmarks over the years is that they're not worth the trouble unless they reflect real workloads and use cases that customers actually care about.
Although I've been critical of the SPC, I've always felt differently about the SPEC -- it's always seemed to be a cleaner, less-manipulatable benchmark that does a reasonable job of reflecting real-world use cases. And, yes, EMC has submitted SPEC results from time to time.
The best way to think about the SPEC test is in terms of simulating a "file workflow".
Imagine you're running a big filer to support, for example, geophysical exploration, or image procesing, or perhaps digital content creation. At one level, you're sort of using the file system as a database. The workflow changes the state of the file system metadata frequently, in addition to the usual reads and writes.
Nick Kirsch, our director of product management for the Isilon team, tells me that the SPEC workloads do a decent job of capturing the "file-based workflow" use case.
The tests are heavily biased towards metadata operations against the file system: create, open, close, rename, move, stat, etc. The resulting traditional I/O operations also have a heavy sequential bias, e.g. dig into this file midway and read a few megabytes, or append this data sequentially at the end of an existing file.
The metadata traffic is significant, the I/O transfers are large, and they're generally sequential.
You get back two numbers to compare: first, the number of file system IOPS, and secondly the average response time between operations.
SPECsfs2008 (the official name of the benchmark) comes in two flavors: one for NFS file systems, and one for CIFS (e.g. Windows-friendly) file systems.
Fair warning: the results can't be compared against each other, nor should they be compared against earlier versions of the SPEC test. And, no, please don't assume that SPEC IOPS are anything like plain, ordinary block IOPS :)
The Headlines
The EMC/Isilon results clearly overwhelm every other result to date, in a couple of interesting ways.
- The absolute numbers are much higher (e.g. we're now talking over a MILLION IOPS, something that's brand new)
- The bread-and-butter Isilon model was used (the S200), not the ultra-high-performance version, so if we need to come back and offer up better numbers at some point, we can -- using existing products
- The results were achieved on completely standard versions of the product -- no lab queens!
- There was no tweaking or tuning on performance, they're largely "out of the box" numbers
- They show predictable and linear scalability from the very small to the very large
- It was all done on a SINGLE file system, vs. the usual aggregation of multiple ones
All the various charts are below for your perusal. So, let's dig in to each piece for a moment.
How Much Faster?
The results -- for both NFS and CIFS -- are *substantially* faster than any other submission to date. Not a little bit, a whole lot. That includes EMC's previous record-breaking submissions using the more traditional VNX-based technology.
The Isilon folks are showing the results two ways: the first way is in a more traditional "aggregate performance" view, where you don't get any credit for doing it all on a single file system. The second, even-more-favorable way is when you show the results on a per-file-system basis.
The ability to do everything in a single, consistent file system matters to many, especially at scale -- we'll talk more about that a bit farther on in this post.
And Using A Standard Model?
Yep. The results were achieved using off-the-shelf S200 modules. Each module had a single SSD for metadata management, plus 23 standard 300GB 10k SAS drives for data storage. There's no great gobs of flash, nor more exotic 15K disk drives either.
Exactly the same sort of balanced capacity-vs-performance model that many of our customers are choosing for their production environments. Take a look at the configuration reports for the benchmarks -- they're about as clean and simple as any you're likely to see from any vendor.
No lab queen here, folks.
And No Tweaking?
I asked Nick exactly how much customization and optimization they had to do to achieve these numbers.
The answer was simple: none.
They did spend some time setting up the environment with instrumentation (using Isilon's InsightIQ) to understand what performance they were getting, and they did play around with a few different protection options, but there's no long list of parameters and procedures to get these results.
Basically, they plugged the stuff in, turned it on, and got the numbers.
Linear Scalability?
One of the beautiful things about the Isilon scale-out shared-nothing approach is its linear scalability. The Isilon team submitted multiple results, each using a different number of nodes. I think the NFS results show what I'm talking about -- the "IOPS per module" number stays basically flat from the smallest to the largest configurations.
Use a little, use a lot -- performance scales linearly and predictably with capacity. Just as you'd want.
Not every environment starts out large, but many end up that way. I think it's great to know that you can grow capacity and performance in bite-sized chunks and with predictable results. If you've ever added capacity to an Isilon system, it's painfully boring: cable up the new module, and basically stand back and let the magic happen: all the resources are transparently rebalanced nondisruptively.
It's one of the more compelling demos in storage-land.
The Beauty Of A Single File System
Any time you've got multiple containers (such as file systems), you've got to make some hard decisions as to what goes where. If things change, you've got to move things around and rebalance. And, since we're talking petabyte-scale, that in itself can be a massive effort.
With OneFS, everything (performance, ports, capacity, etc.) is one, massive, uniform and autobalanced file pool. Nothing -- but nothing -- could be simpler if we're talking files.
And, as I'm starting to discover, utter simplicity really, really matters at scale :)
Why This Matters For Customers
If you're dealing with big data, or just the proud owner of way too many traditional filers (!), you'll probably care about these results. Not only the numbers themselves, but how utterly simply it was to achieve them. The hardest part was probably rounding up all the equipment.
The Isilon results show what can be accomplished with scale-out, shared-nothing NAS using commodity technology and purpose-built software, e.g. OneFS. It's hard to imagine a more traditional filer approach getting anywhere close to these results. We'll see how long these results stand :)
If you're a smaller NAS vendor in the storage industry -- and you plan on selling to these demanding customers -- you've got some serious work ahead of you to catch up.
The race is on ...
(charts, links, etc. below)
A nice graph showing the flat linearity of performance per node
And, if you're still reading, you might be interested in this writeup from ESG.
Hi Chuck, well done. Take a look at http://spec.org/sfs97r1/results/res2006q2/sfs97-r1-20060522-00263.HTML
NetApp did million+ IOPS in 2006 with just 2000 FC drives. Why does EMC isilon need 3000+ drives with some SSD to achieve similar results.
Did not expect this from the mighty EMC. May be 5 million IOPS is fair but this is just getting some bragging rights to me.
-TM
Posted by: TM storage | June 27, 2011 at 12:14 PM
As you should well know, the sfs97 tests and the sfs2008 test are *completely* different and not comparable in the least.
The SPEC website is pretty clear on this point: http://www.spec.org/sfs2008/
Did you not know that? Or is this some sort of lame FUD?
-- Chuck
Posted by: Chuck Hollis | June 27, 2011 at 12:44 PM
still no SPC-1. IOPS is one thing, but how much will those numbers cost?
Posted by: tgs | June 27, 2011 at 12:47 PM
They'll cost 144 times the cost a node, I'd imagine. As with any storage, work out how much you need and at what performance level, calculate the cost of supporting it, then choose the best for your environment.
Isilon have proved almost linear scalability which is compelling for growing datasets.
Posted by: Anonymous | June 27, 2011 at 02:03 PM
For those of you who are predictably skeptical of standard benchmarks, I'd encourage you to head over to the SPEC site and peruse the documentation in detail. In particular, I was impressed that the test profiles are based on actual workloads seen on thousands of production file servers, often from member organizations.
Say what you will, but I think that approach offers a reasonable approximation for the desired purpose.
Posted by: Chuck Hollis | June 27, 2011 at 07:28 PM
LOVE your EMC Isilon banner campaign in support of this report, Chuck - brilliant marketing, as usual. (Is that a Joint Strike Fighter? Or an F16? Because that's not very 2011 either): http://finance.yahoo.com/q?s=ntap&ql=1 - Cheers, Kees
Posted by: Kees Henniphof | June 30, 2011 at 10:43 AM
Hadn't seen it ... thanks!
Posted by: Chuck Hollis | June 30, 2011 at 10:58 AM
It would be interesting to see how that compared to Netapp's scale out offering on their new hardware platforms. They havent benched that for 4 or so years.
Posted by: SB | July 20, 2011 at 09:09 PM
You're right, it *would* be interesting to see a modern SPEC benchmark from NetApp.
Frankly, I think they're at a serious disadvantage these days, product and architecture-wise, so I don't expect them to do so anytime soon.
-- Chuck
Posted by: Chuck Hollis | July 21, 2011 at 11:28 AM
Revisting this blog post, Chuck. Looks like NetApp re-ran benchmarks & posted new results (for SPEC2008). They have all kinds of news and technical reports published on it too. Funny enough, their math is wrong on the % IOPS comparison in all publications. For all other metrics they did the math right... 1-(resultsA / resultsB)x100 = % improvement over vendor A. Except with IOPS. See this publication: http://media.netapp.com/documents/TR-3990-1111.pdf, and this news release: http://www.netapp.com/us/company/news/news-rel-20111102-42509.html
In all cases, their marketeers tried the above formula and came up with 35 or 36% when comparing the two. Funny enough, I get 26%, no matter how many times I try it. Can we trust anything they publish???
Posted by: Robert Pell | November 02, 2011 at 01:35 PM
I think the problem is that NetApp (like many smaller vendors) can get all caught up in their enthusiasm around themselves, their products, being a good place to work, etc. -- and sort of lose sight of the world around them: customers, partners, etc.
I hope they never, ever change :)
-- Chuck
Posted by: Chuck Hollis | November 02, 2011 at 03:02 PM