The Storage Performance Council has been around since 2002. Its stated goal is to provide independent cost/performance metrics for storage arrays.
But has it achieved its stated goals? And does anyone (other than a few vendors) take them seriously?
We occasionally get asked why we don't participate, and there are some good reasons -- none of them particularly sinister (apologies in advance to consipiracy theorists everywhere!)
But why is there no good independent performance metric for storage?
And why do some vendors (like IBM) continue to insist that the test (and their participation) is relevant?
It's an interesting perspective on the storage industry, and how simple things can get really complex once you look at them.
The Goal
Way back when when storage devices were generally slow and really expensive, there was a lot of interest in storage performance, especially when balanced against cost.
Not all arrays were designed the same, and there was a very wide range in cost/performance from vendor to vendor.
The goal of the Storage Performance Council (actually, I think the "council" is actually one guy who runs it part-time) was very laudable: establish a uniform testing procedure that could give customers a simple comparison to stack products up against each other.
The first test suite was the SPC-1. If you go to the site, you'll see that there haven't been many submissions there lately. Oversimplified, its model seems to be trying to simulate a transactional environment.
The newer test (SPC-2) tries to target a different performance profile than the SPC-1, in that it seems to focus on sequential data access as you'd find in a video streaming or backup-to-disk environment. More submissions from more vendors here.
So What Went Wrong?
I don't know how many of you have ever spent much time getting deep and gritty into various aspects of storage performance -- especially storage at scale -- but it is mind-numbing in complexity.
No simple analogy with car performance will do here -- there are more wrinkles to this subject than can be covered in a short post, and -- unfortunately -- many of them often matter in the real world.
A simplified test -- of any sort -- that doesn't accurately predict real-world performance will be of little use. And -- to this day -- no one has come up with a simplified storage performance test that can capture enough richness of real-world use cases to be of much value to either vendors or customers.
Note: if you're feeling particularly cruel, ask one of the vendors who tout the SPC to explain how its results can be used to predict performance in your specific environment, and watch the fun ..
Sad, but true. Occasionally there will be a wave of resentment from some of the more vocal industry analysts over this poor state of affairs. Some will even blame big companies like EMC for somehow causing this situation to exist.
Guys, we didn't invent storage, and we don't control the wide range of ways customers actually use it. And if there was some simple test that was an accurate predictor of real world performance, man, we'd be all over that.
Because we spend way too much money and time doing performance testing the hard way.
I think most storage users have figured this out. We've never done an SPC test, and probably will never do one. Anyone is free, however, to download the SPC code, lash it up to their CLARiiON, and have at it.
What We've Learned About Performance Testing
I thought I could capture the essence of EMC's view here, but it's going to take too long. Let's say we believe that there's no substitute for a real world test. And, if you're going to use an array for multiple purposes, the interaction is important. And, of course, performance during degraded conditions (such as a drive failure) will be very interesting. And what happens when the I/O profile changes dramatically.
Exchange drives a different I/O profile than Notes, and it matters in the real world. Ditto for Oracle's I/O profile vs. SQLserver vs. UDB. The list keeps going on, and on, and on ...
And that's just the beginning -- I have not begun to do the topic justice. Maybe the Storage Anarchist will help me out here.
I'd hazard a guess that we've got multiple billions of dollars invested over the years doing performance characterization. That's not a typo. It's such an ingrained part of our culture (and our R+D spend) that it's hard to measure accurately.
So, our belief is that if performance is important to you, you're not well served by simplistic test results.
And if performance isn't all that important to you (which is increasingly common), other factors will dominate your decision-making process.
Which Brings Up The Topic Of Benchmarketing
I haven't looked at the submitted configurations for SPC results for a while, but a few years back, they were excellent examples of -- well -- some of the most creative storage configurations I have ever seen.
I saw stuff there that no sane customer would ever put into production. So, if no one would actually buy a configuration like that, why would you test it and publish the results?
The answer to this, sorry to say, is when vendors confuse value-added benchmarking with benchmarketing.
In the former, you pick a customer use case and say "here's what you're likely to see in your environment".
In the latter, enormous effort is spent to come up with a better number, and then bash all the competitors with it ruthlessly.
It doesn't matter if the number is relevant or not. Just that Vendor A beat Vendor B.
Not to throw rocks, but that seems to be IBM's game plan for storage marketing, at least for SVC.
I found that IBM claimed to be the fastest storage virtualization product on the SPC. If you have a taste for over-exuberant press releases, go have a read.
Last time I looked, they were the ONLY storage virtualization product on the SPC list.
So, while a technically correct claim, it is very misleading. In a contest where you're the only entrant, I'd hope you'd win.
Now, to be fair, there's a smattering of HP array product there, and some of Sun's portfolio as well, and Fujitu's product as well.
No Hitachi. No EMC. No NetApp. Lots and lots of vendors have skipped the exercise entirely. Maybe for different reasons than EMC, but the net result is the same.
And, interestingly enough, only a few select products from the IBM/HP/Sun portfolio. I wonder why this is? You'd think that if the SPC was a valid, customer-oriented test, they'd test the majority of their products, and not cherry pick.
But I digress.
Will We Ever See A Simple Storage Performance Test?
Well, there are already lots of them out there, including the SPC.
Problem is that none of them do a good job of predicting real world performance for anything other than, ummm, running that specific test.
The real question is whether the problem is solvable, or not. I will offer the opinion that the problem is getting worse, and not better.
We're seeing a wider and wider range of storage use cases, each of which translates into unique I/O profiles. We're seeing more and more consolidation, which means running more use cases on the same device at the same time. And data volumes continue to grow, which means that more often, the only valid tests are those done at significant scale.
The prognosis is not good for SPC, or any similar effort. Realistic performance testing is getting harder, not easier.
So What's A Customer To Do?
In my mind, you have a choice.
You can either build a very expensive test bed, and find out for yourselves, or you can partner with a vendor you trust that can understand your environment, make some good recommendations, and hopefully be there for you if things don't go as planned.
Now, I've met very large customers with very critical applications who do their own testing or have us do it. It's worth it to them to know what's fastest, because it's critical to their business.
But, for the vast majority of storage customers, the second option (find someone you trust) is probably more realistic.
Just don't count on a simple test to give you the magic answer.
I think you've seriously missed the point here. SPC tests do not attempt to 'predict' any kind of customer to customer real life actual performance.
What they DO provide is a common workload that can be used to scale, or dare I say benchmark, one product against another in a controlled and consistent manner.
Posted by: orbist | July 13, 2007 at 01:49 PM
Hi Chuck,
I agree that these SPC tests are not worth the paper they're written on (actuall Im not sure if you even a paper printout ;-).
I once remember seeing that an EVA had beaten an HHDS 9900 or 9980, memory escapes me, for a long large block sequential workload written to a disk group consisting of about 100 disks, along side another, this time smaller block random pattern written to separate disk group of 50 or so disks. The thing is, they had turned off cache mirroring, which you cant do on the 9900, and had the LUN for the sequential workload owned by one controller and the random owned by the other controller.
They had big smiles on their faces as well - as if their tests actually meant aything in the real world!
I also agree with the consolidation thing, with many customers caring less about the sheer speeds of the box and more about the functionality it supports and how it copes under multiple different workload types.
I mentioned this trend in a recent post. Im basically seeing quite a lot of small configuration enterprise boxes being sold into environment that might previously have tended towards a midrange solution.
Nigel
Posted by: Nigel | July 13, 2007 at 05:01 PM
Hi Orbist
No, with all due respect, I don't think I have missed the point. Yes, it's a (sort of) repeatable test in customer hands, less so in vendors'.
But I am absolutely stuck on the thought that what use is a repeatable test unless it has some sort of correlation to the real world?
Otherwise, it's just an exercise in technical self-gratification, I think.
Thanks for the comment!
Posted by: Chuck Hollis | July 13, 2007 at 07:22 PM
Hi Nigel
I take no small measure of pleasure that we are actually agreeing with something!
Since I spend a lot of time with customers, one of my "do you get it" indicators is if they start hammering me about the SPC.
I'm more than happy to explain the situation. But sometimes they persist.
If they do, I suggest to the rep that they might want to think a bit about how they approach this particular customer.
Posted by: Chuck Hollis | July 13, 2007 at 07:30 PM
Chuck,
Thanks for the response. I can sort of see where you are coming from, and I do agree that today's storage infrastructures require a lot of fine tuning on an instance by instance basis to ensure maximal performance in any given environment.
I think everyone understands that vendors headline figures (which are usually unrepresentative of ANY realife workloads - that is, small read cache hit measurements) should be taken with a pinch of salt. However surely you would admit that at least the SPC tests attempt to benchmark some kind of realistic workload, with mixed reads and writes, streams that do repeat range IO and response time must be kept under 30ms. I guess if nothing more, it shows an openness from those vendors that do take part.
Nigels comments do bring up the point of needing a consistent strategy from the SPC testing point of view. However, from what I've read any 'cache disabled' results are clearly marked in the title and SPC insists on at least 50% disk utilisation (which is a common strategy used by most shops when disk performance is key)
Just my 2c
Posted by: orbist | July 14, 2007 at 03:18 PM
Why does EMC participate SPEC? If all tests that aren't a direct test of a production environment are bogus, then why submit for SPEC?
Posted by: blake | July 14, 2007 at 07:11 PM
Hi Orbist
I'd guess I'd differentiate between a tool that a smart technical type wants to use SPC code to do his/her own testing (fully aware of the pros and cons fo doing so), and the public spectacle of SPC press releases.
In the former case, the tester is free to modify not only the test but the testing conditions to more accurately reflect the purpose at hand. Not so in the latter case.
As far as modifiable test beds go, EMC has invested dozens of man-years just building testing simulators (let alone doing the testing), but not everyone has a rationale to do this sort of work.
I, for one, don't know where the "50% full for performance" urban myth originates. I understand the technical reasons why perhaps NetApp and IBM might have to do this, but it is not a standard EMC recommendation by any stretch.
I, for one, would think that this would make nnominally cost-effective storage very, very expensive in practice.
Thanks for writing!
Posted by: Chuck Hollis | July 15, 2007 at 09:02 AM
Hi Blake
Good question. I don't have a good answer.
I think that the SPEC was pretty well established and accepted (warts and all) before EMC came to market with NAS devices, so I think we were faced with a different choice there.
SPC does not enjoy those same circumstances, so it might be a different situation.
Good point, thought -- as any criticisms leveled at the SPC could be aimed at the SPEC as well.
Posted by: Chuck Hollis | July 15, 2007 at 09:05 AM
Thanks again Chuck for the reply. I guess we'll have to agree to disagree on this one.
As for the 'urban-myth' of short-stroking magnetic media - its no myth. Reduce the distance the physical arms have to move and you reduce seek and latency time quite dramatically.
Posted by: orbist | July 15, 2007 at 02:19 PM
Hi Orbist
We're talking about different things.
You're talking about short-stroking, a well-understood (but expensive) practice used when random reads are the dominating I/O profile, and there's insufficient NV cache to soak up writes.
I'm talking about the "reccommendation" from some array vendors that only half the capacity be used in normal production to get decent (not optimum) performance, due to poor contoller design, poor microcode design, or sometimes both.
I bet you don't short stroke your enitre pool production arrays -- why should we see SPC tests for unrealistic configurations that no one is likely to put into production?
It's possible on large cached arrays (such as DMX) to lock entire volumes in cache, resulting in unbelievable performance. Should we submit this as our example of "short stroking"? No, because that would mislead people, wouldn't it?
Same idea here.
So, what do you use the SPC information for? I'd be curious.
Posted by: Chuck Hollis | July 16, 2007 at 04:41 AM
I agree that the SPC can't possibly match a clients workload, but at least it is an open documented workload. I know that EMC does quite a bit of testing, but if the results are going to mean anything, the test methods have to be inspectable by the clients and the competition.
If HP performed a "benchmark test" comparing their controller with yours, but they deactivated caching and gave your box slower disks, the results would be slanted in their direction. Now imagine they did not disclose their testing methods. It would look like their box was better.
The only way to meaningfully compare performance is if the method of comparison is public.
Posted by: open systems storage guy | July 17, 2007 at 10:43 AM
Hi Blake,
Comparing SPEC and SPC is a bad idea since they operate in drastically different manners and SPEC for NFS doesn't favour JBOD. Something the SPC tests most certainly do.
If you check the SPC benchmark results you'll notice that only IBM & Sun have recent results listed, HP not submitting anything for well over a year and Hitachi, NetApp, and EMC being absent.
http://www.storageperformance.org/results/benchmark_results_all
What we have here is a case of IBM competing against itself in a benchmark the majority of the industry appears to have found fault with.
Posted by: Storagezilla | July 17, 2007 at 02:41 PM
Hi OSSG (open systems storage guy)
I think we're not seeing eye to eye on a key point.
Whereas the SPC is a (somewhat) repeatable workload, the testing conditions allow variability of some key variables.
As an example, when vendors get to choose short-stroked 36GB drives for their configs, or turning off write-back cache, or other "creative" approaches, I still wonder at the relevancy of the test.
As Storagezilla points out, both SPC1 and SPC2 are designed to defeat the benefits of cache -- SPC1 by randomizing I/Os and focusing on reads, the SPC2 by simply streaming files.
To the extent that you, the customer, want to use no-longer-manufactured drives in an inefficient manner, or expose yourself to risk by turning off cache protection, or have convinced yourself that storage cache has no value in your environment, then the tests are valid.
Again, many of the posters stress the repeatability of the SPC. While I will partially grant that point, no one has been able to make a case for relevancy.
Imagine I create the PDB (pencil drop and bounce) benchmark for storage. This consists of dropping a pencil (eraser end down) on the top of a storage array, and watching how high it bounces. Higher scores mean you have a better array, according to the test.
I will shortly announce the formation of the PDB Council to ensure that all tests are conducted in an open and fair manner.
I can make it very repeatable, open, independent, etc. -- but is it relevant?
BTW, for the record EMC publishes full details (down to excruciatingly boring detail) on every benchmark we run on our own (or competitor's) gear.
A key part of our customer base are some very sophisticated storage guys; they do not appreciate "creativity".
Thanks for writing!
Posted by: Chuck Hollis | July 18, 2007 at 07:35 AM
Hey open systems storage dude.
I'm just pointing out that Chuck calls any industry benchmark as bad, then why does emc submit results to spec?
Sure the tests are different, but that's not my point.
Posted by: blake | July 19, 2007 at 01:47 AM
So, OK, this is a fun discussion, and I'm getting something out of it.
There is definitely a school of thought out there that "any half-decent test suite is better than none at all" glass-is-half-full perspective.
And, to be honest, I can't argue with that. Tools are tools, and -- in the right hands -- they can be effective.
From that perspective, SPC is not a bad thing, nor is IOzone, nor is SPEC and so on. Smart people know what they're getting into -- warts and all.
And there is a second school of opinion that any semi-standardized comparison is better than none at all. Yes, the test has its flaws (both in construction and vendor methodology), but they're well known, and at least we -- as consumers -- have some sort of metric to look at.
And then there is a third school of thought (mine) that says that the flaws in both test construction and methodology are so severe that -- on the whole -- the test does more harm than good, especially when transformed into a benchmarketing tool by an aggressive vendor such as IBM.
SPC1 is a random IO generator that runs for a moderate period of time. I can't easily map its behavior into any mainstream application profile that I'm aware of.
Being a random IO generator, it favors smaller, short-stroked disks, and does not favor any design that uses read cache, read-ahead techniques, and the like.
In this environment, small JBOD disks will most likely deliver the best cost/performance ratio.
It does not contemplate use of local or remote replication. It does not contemplate performance in the event of a component failure. It does not contemplate a changing I/O mix or rate. It does not contemplate a mixed I/O profile. I could go on, it's a long list.
SPC2 appears to be a file streaming benchmark, and -- as such -- I can make a bit more of a case that there are real-world applications (such as video streaming and the like) where it might be just a tad more relevant.
I take issue with the less-than-relevant configurations that some vendors post. Go take a look at what they've tested, and ask yourself, would you ever put in a configuration like that?
I note that vast portions of the storage vendor landscape have voted to pass on this particular exercise (EMC, HDS, NTAP, Dell and many, many others).
Other than IBM, those that participate to some degree appear to do so half-heartedly -- an obsolete config here, a small array there -- and do not embrace it in a consistent fashion.
Even IBM cherry picks to suit their interests, which -- in my mind -- has turned the SPC into IBM's favorite storage marketing tool, especially in regard to SVC.
The degree of insincerity and manipulative behavior from IBM on this issue is appalling, and reflects poorly on an otherwise fine company.
Fortunately, it does not appear to be effective in the marketplace. The question in my mind is -- why do they persist?
Posted by: Chuck Hollis | July 19, 2007 at 08:13 AM
I can tell you, if EMC could win this performance benchmark, they would have done this test and not only published it, but knowning EMC, would have put a half page ad in Wall Street Journal and sent teams flying to NetApp customers globally, talking about this benchmark...
This is like saying, 'Since I can't win this game, the game is flawed'
EMC is exposed!
Posted by: Storage dog | February 04, 2008 at 06:36 PM
Storage dog, I guess you don't know us very well.
We think the test is flawed. Culturally, we don't like flawed tests. Too many storage engineers here to make that one fly.
We think the SPC methodology and organizational construct is flawed. We really don't want to lend credence to something we don't believe in.
Finally, getting a half-page ad in the WSJ would be a lousy way to get this sort of message out.
The bit about sending flyers to NetApp customers, well, I wouldn't put that one past us ...
Thanks for writing!
Posted by: Chuck Hollis | February 05, 2008 at 02:09 PM
"I'd hazard a guess that we've got multiple billions of dollars invested over the years doing performance characterization. That's not a typo. It's such an ingrained part of our culture (and our R+D spend) that it's hard to measure accurately."
EMC's *total* R&D spend in 2007 was $298M. 2006 was $143M. 2005? $72.6M.
Even assuming that all R&D spend since EMC's founding went exlusively to performance characterization, "multiple billions" strikes me as a very hazardous guess.
Posted by: Tom | May 08, 2008 at 11:31 AM
First, your numbers aren't even *close* to the actual spend. I wonder where you came up with them, since we don't routinely disclose precise R+D spend.
Second, some of the investment appears as customer support, environmental qualification, and is not entirely captured as R+D.
As an example, EMC publicly states that we spend more than 10% on R+D. If we're north of $10B in revenue, that would certainly be a much larger number than you're offering, wouldn't it?
Finally, I'd invite you to tour the physical eLab facilities where we do all this testing. There are multiples, so be prepared to travel a bit.
And I think you'd come away convinced regarding the validity of that statement.
Thanks!
Posted by: Chuck Hollis | May 09, 2008 at 06:27 AM
I would like to propose not to wait until you get big sum of cash to buy all you need! You should just take the lowest-rate-loans.com or just collateral loan and feel yourself fine
Posted by: AvisMORENO22 | May 06, 2010 at 06:00 AM
How many have had their computers crash because of viruses or Trojans loaded when we accidentally arrive at the 'wrong' web site or have had our teenagers do the same thing? How much pornography is available on the Internet? Are these questions that may be addressed by the FCC or Henry Waxman? Absolutely not! There are web sites that broadcast conservative, libertarian or Republican thoughts that need to be eradicated. The fact that people want to hear Rush Limbaugh a great deal more than that idiot Chris Matthews only indicates they've been brainwashed . . . but in the wrong way. Henry Waxman to the rescue! Matthews, Maddow et. al. need to be heard! We'll whisper if need be. We'll seek others with open minds and we'll find each other. Neither Henry Waxman nor any other liberal will stop us. Thank you for sharing :)
Posted by: Ayırma Büyüsü | August 23, 2010 at 09:19 AM