In some corners of our industry, technology discussions can turn into interesting studies of human behavior.
Even though EMC’s capabilities span a very broad range these days, we still get involved in our fair share of storage technology debates. Some are relevant, others aren’t.
And, since I once was a crusty, knuckle-dragging storage guy, occasionally I feel compelled to wade into certain areas.
And today, we talk a bit about disk drive failure rates.
Maybe you saw the interesting white paper from a team at Google.
They tracked a population of disk drives over a period of five years, and concluded “hey, the data doesn’t really match up to what we might have thought”.
And then the blogging started. Responses to responses. Vendor posturing.
Many of us took a look at this and thought “sheesh, what’s the big deal?”
So here’s a couple of thoughts on the discussion.
Sometimes disk drives don’t fail as you’d expect.
That’s about the only thing you can conclude from the white paper.
Trying to cross the chasm from that observation into speculation as to why that might be the case isn’t supported, at least in several people’s opinion.
Why is that?
Well, there’s just too many uncontrolled variables. We have too little information about drive specifics, how they were used, how they were maintained (if at all), and so on.
Anyone who’s worked in the storage industry knows that there are dozens and dozens of variables that can affect the life of a drive.
Externally – how was it mounted, what about orientation, vibration, what about duty cycle, was it in a long FC loop, temperature spikes, did it get firmware updates, and so on and so forth. Long list here.
Internally, what rev of the mechanicals, media substrate, head composition, lubricant, firmware, interface, were the drives screened, etc. etc. Another long list here.
You get the idea – there are just too many potentially significant variables in play here to conclude much of anything tangible. And if you think those sorts of variables are held relatively constant over a long period of time, well, that’s just not the case.
Complicating matters is the fact that disk drive technology is evolving incredibly rapidly, so even if you could make some sort of sense of the historical study, it’d be unlikely to apply in the future.
Maybe a non-technical analogy is in order ...
Let’s say you ran a human mortality study by tagging everyone who passed though the Atlanta airport on a given Wednesday.
And you followed them for five years, and somehow your data didn’t match up with what you’d expect.
You could come up with all sorts of wild speculation around airports being suspect, airplanes being suspect, connecting flights being suspect, airlines being suspect, Wednesdays being suspect, Atlanta being suspect, the government is lying to us, and so on.
None of which would have much of a valid basis for conclusion, right?
All you could say is that your data wasn’t quite what you expected, and more research is called for. That seems to be the case here.
Some people have their pattern recognition circuitry turned way up.
All of us have an instinct to detect and recognize patterns. It’s a key part of human intelligence.
But our capabilities are not perfect. Sometimes there’s a pattern, sometimes there’s not.
We see castles in the clouds, faces in the moon, and so on. Sometimes it turns a bit darker, and we think things are there that aren’t supported by the data.
Thank god for statistics and the scientific method.
In some of the blog posts, I think certain people had their pattern recognition circuitry turned up a bit too high. Either that, or they thought that by being controversial, they could increase their presence in the community.
Do I think that the white paper findings are interesting, and deserve more study?
Do I think there is a conspiracy among vendors to mislead the public?
Don’t be ridiculous.
You guys are giving us way too much credit here.
Some vendors will capitalize on just about anything.
Thinking back to the post 9/11 era, I remember all the tacky marketing campaigns from data protection vendors with the unspoken message “it could happen again!”.
I find that sort of marketing very repugnant.
In one of the more strident blogs, NetApp took the opportunity once again to position themselves as both as concerned citizens and thought leaders on this ”important industry topic.” Lots of misrepresentation, skewing of the facts, etc. It didn’t come off too well for them, at least from my perspective.
I didn’t have to respond in detail, as others have taken the liberty this time.
At least they’re consistent.
The job of storage vendors is to protect users from disk drive failures.
Components fail. Sometimes they fail as you’d expect. Sometimes they don’t.
Storage array vendors can use a wide array of techniques (and we’re not just talking simplistic RAID) to protect against disk drive failures.
And trust me, once you wade into the arcane details, there’s a wide variability in how different vendors attack the problem.
Some approaches cost more than others. So there’s always a useful discussion around what you think you need, and what you think you can afford.
And, as disk drives evolve rapidly (which they’ve been doing for several years, and show every sign of continuing to do), the pros and cons of different approaches will vary over time.
What made sense a few years ago might not make sense today. Such is life in the technology world. Conventional wisdom changes faster than we’d like.
Once again, it depends.
We all should avoid any discussion that starts with “the only right way to …” Very few topics in storage, or technology (or life!) are that simple.
Avoid these people, please.
A Couple Of Final Thoughts
So, I guess we have preliminary data that says, sometimes, disk drives don’t fail like you’d expect them to.
That would mean they join the very long list of technology components that seem to exhibit the same behavior: my cell phone, my cable connection, the motherboard in that three-year-old PC I own, that damn plasma TV I bought a few years back that’s now useless, my growing collection of iPod bricks, and so on.
For me, I guess the real surprise is that -- people were surprised.