For those of you who are trivia buffs, you’ll probably remember the sad story of Radithor. Shortly after radioactivity was discovered in the early 1900s, the market was flooded with radioactive patent medicines that made all sorts of claims.
Radithor was made by adding a small amount of radium to distilled water. Nasty stuff. But it was sold as a cure for many ills.
In the case of Radithor, the slow horrible death of a man who drank hundreds of bottles contributed to the establishment of FDA, which – at a high level – ensured that medicines did no harm, and hopefully did what they said on the label.
This turned out to be the worst form of quackery. People who had a specific problem turned to these solutions for relief; not only didn’t they get better, they had new problems.
Today, I’m going to take the gloves off and look at what I consider a particularly egregious example of benchmarketing in the storage industry.
Not only does it not do what it says on the label, you could end up far worse than you started off. You can arrive at your own conclusions as to whether or not I’m overreacting.
Context
Benchmarketing (the conducting of unduly contrived performance tests to show your product in a favorable light) has been a staple of the IT industry since the very beginning.
All IT vendors practice this to a certain degree; the trick is to find the ethical boundaries – and of course, when it comes to ethics, standards vary widely.
Generally speaking, EMC tries to draw the line at avoiding benchmarks that don’t represent realistic use cases. We try to be very specific on how the test was conducted, and who conducted it. And we want to make sure – ultimately – that it doesn’t lead a customer to an incorrect conclusion that will get them (and us) in trouble.
We don’t claim our tests are independent when they’re not. We go to laborious pains to make sure that the results are not only reproducible, but they map well to typical use cases.
And if we’re not happy with the results, we fix the products, and not the test.
The Story
On November 7th, Network Appliance issued a press release indicating the results of independently conducted testing between EMC’s CX3-80 and NetApp’s new FAS3070. Needless to say, we were intrigued. We were familiar with both products, and many of the claims sounded a bit outrageous.
The first red herring we noticed was “independently tested by VeriTest”.
I suppose that VeriTest is independent the same way my lawyer is independent – both get paid to deliver very specific results.
To be fair, the test report is clear that NetApp compensated VeriTest to conduct very specific tests. However, in the same spirit of fairness, I don’t think most reasonable people would consider this an “independent testing” function.
If you look at the VeriTest web site, you’ll see that NetApp is using the “independent services” of VeriTest to do quite a lot of anti-EMC testing. Good for VeriTest, you’ve got a great customer in NetApp, but please don’t represent this as “independent testing” to the industry.
The second thing we noticed was that the testing configuration was very unusual. Although very large systems were configured, most of the tests were done on a relatively small, single 400GB LUN.
The test describes a 200-drive system. With 146GB drives, that’d be about 30TB raw capacity. 400GB is around 1.5 to 2 percent of the usable capacity. Strange. And why would you configure 400GB as a single, enormous LUN? Even stranger ...
And as we dug deeper and deeper, there were other things we noticed. Creating a single snap and never using it for anything, as an example. We noticed the tests were run for only a relatively short time, as another example. It just didn’t add up.
So we decided to run some of our own tests.
First we were surprised, then we were outraged.
What We Found
Well, it turns out that NetApp’s choice of tests were not arbitrary, they were very carefully thought out. Clever, yes, but …
The first thing we found was clear evidence that when a NetApp device is configured with many LUNs and even partially filled, the performance profile changes very dramatically. Every NetApp performance test I’ve seen published uses 10% or less of the total capacity. Now we know why.
It gets wicked slow. Even if you don’t access the additional used capacity.
The engineers can probably explain it better than I can, but I am led to believe that it has to do with how WAFL acts as an intermediary between the LUN presentation to the server and the physical array LUNs.
When there’s plenty of free capacity, WAFL is free to distribute things among available spindles, so it runs very fast indeed. When things get even a little full, the same trick works just the opposite way.
From our point of view, we would naively think that the primary reason people buy large storage arrays is to use them for actually storing data.
Having an array that runs substantially slower when it’s full would be an important issue if I was a customer and I was concerned about performance.
Now, please don’t get me wrong, there’s a certain rough-and-tumble to the competitive side of the IT industry that’s all part of the landscape. And there's lots of back-and-forth and he-said-she-said on performance claims. All part of the fun.
But I think this is crossing the line, and I think this would be a surprising outcome regarding what customers expect in terms of performance characteristics from a large storage array.
The second thing we found – and this was amazing – was that performance would degrade during long-running tests.
The thing would actually run significantly slower if you let the test run for many hours or days. The longer it ran, the slower it got.
From my point of view, this is definitely crossing the line. Apply a steady workload to a traditional storage array, and you wouldn’t expect performance to degrade 20-40% over a few days.
If you’re interested, here is the EMC-produced documentation on what we found. By the way, we’re not claiming it is independent; we did it ourselves. Judge for yourself.
And, if you’re really interested, here’s a simple little test you can run all by yourself on your very own NetApp filer to see performance degrade over time.
It doesn’t take a lot of time to set up, and I think you’ll be surprised.
Back On My High Horse
OK, now let’s look at it from a customer perspective. A customer thinks that they’re going to need good performance for their environment – and, fair enough, not all do. They go looking for products that offer substantial performance, and may even pay a premium for this.
Let’s just say that they believe NetApp – that the tests were independently conducted, and that the FAS3070 offers astounding levels of performance compared to one of the industry standards, the EMC CLARiiON.
Now, let’s say this poor guy loads up a couple of important applications. He comes back in a week or so and wonders what happened to performance. Or he decides to start filling up the array with LUNs and data, and the same thing happens.
This poor guy is probably going to be in a world of hurt.
His only option is to either buy additional unused capacity and let WAFL spread things around, or get a traditional SAN device that doesn’t try to get clever like WAFL does.
I also think it’s fair to point out that part of NetApp’s marketing pitch is that you’ll need less storage in your environment, and not more.
I can’t say that any of the NetApp materials I reviewed are technically incorrect. They are accurate as written. But, unless you are an extremely skilled interpreter – and very suspicious by nature -- they will lead you to a painfully incorrect conclusion.
Radithor Redux
So, it looks like we have a “Radithor” example.
Not only doesn’t it do what it says on the label, there’s a good chance you’ll end up worse than you started.
And, in the IT industry, there’s no FDA around policing vendor claims.
Well, maybe there should be.
According to me, Benchmarketing aims to make the benchmark numbers as high as possible, regardless of whether they actually have any predictive power.
Posted by: software testing services | June 29, 2010 at 09:26 AM
Bench marketing is a new concept for me. I never heard of this term, thanks for sharing such an informative post, which will help in long run.
Posted by: Account Deleted | March 17, 2011 at 03:34 AM