We’ve been talking about it in the storage business for at least five years, maybe more.
On the surface, it’s got almost everything going for it: good performance, cost-effective, leverages existing network investments, strategic roadmap to 10Gb, and so on.
You’d think it’d be pervasive in large enterprise shops these days. But it’s not.
It’s been on the list of industry predictions every year since 2004.
This is the year of iSCSI. No, wait, this is the year. Really guys, it’s going to happen this year. And so on.
But that hasn’t exactly happened – and it’s not for the reasons most people think.
I think iSCSI serves as an interesting case example of how change really plays out in the IT industry, and that things may not be what they first appear. There are lessons to be learned all around – not only for customers, but for IT vendors as well.
At first glance, iSCSI is pretty attractive. Not to oversimplify, but it's a protocol that enables the fundamental storage command set (SCSI) to work over an IP (ethernet) link.
All of us in the storage industry thought this was pretty cool.
Customers could save big money by avoiding the purchase of fibre-channel HBAs.
They could use lower-cost ethernet switches rather than their more pricey FC counterparts.
And there was the potential of a common technology and skill set to support both data networking and storage networking.
Certainly something worth pursuing.
And so we – and every other storage vendor – started to invest heavily in bringing iSCSI solutions to market. Let me take you through the journey of what we did, and what we learned.
The Red Herrings
There were some red herrings early on in the discussion that proved to be irrelevant.
At one time, the default assumption was that customers might need TOEs (TCP-IP Offload Engines) in their servers, which would mean new technology and higher costs. Not good.
Well, servers became faster than ever, and it soon became clear that there wasn’t going to be much need for a dedicated accelerator card to make iSCSI work well.
There’s also the perceived performance issue. Hey, iSCSI runs over 1Gb ethernet, and FC is either 2Gb or 4Gb, so doesn’t that make FC proportionally faster?
Well, not really. I joke that the electrons go just as fast over both.
On a more serious note, storage performance boils down to response time and transfer rates. Response time is roughly similar for both (sometimes iSCSI can be faster), and transfer rates are the same until you exceed the bandwidth of the pipe, at which time you’re either interested in a faster pipe (read FC) or an additional pipe (read iSCSI) and spreading the load.
As we matched up iSCSI performance against our rather extensive knowledge of real-world workloads, it became pretty clear that a significant majority of applications could run comfortably on iSCSI with no negative performance impact whatsoever.
And there’s been good traction – in some segments
If you segment the storage market into different pieces, you’ll find segments where iSCSI is popular and growing in strength.
As an example, if I’m interpreting the IDC numbers correctly, in the first three quarters of 2006, $369 million of iSCSI arrays were sold. EMC (including Dell OEM) had #1 share with 24.2%. The market grew at 96%, and we grew somewhat faster than the market.
But that pales in comparison with the roughly $6.5 billion of non-iSCSI based SAN storage sold during the same period, the vast majority of which is FC. Put differently, iSCSI is about 5% of the overall SAN storage array market.
So where are these iSCSI arrays going?
Mostly, places where there’s no FC installed, and it’s a greenfield choice for the customer. Smaller shops where IT skills (and budget) are limited are good for iSCSI. In large enterprises, non-core SAN deployments (maybe test/dev environments, or smaller sites) have some minimal pickup.
But I think it’s safe to say – iSCSI hasn’t cracked the bigtime: large enterprises that – in total – spend several billion dollars a year on FC-related technologies. And it doesn’t look like it’s going to change in the near future.
So why is that? And therein lies the story ...
The first barrier – platform support
If you’re going to do iSCSI, you’re going to want a storage array that does it natively. The first “solutions” involved converters between FC and iSCSI protocols. Yuch: cost, complexity, performance, and so on. No one really took those seriously.
But the product barrier soon fell. As an example, EMC storage products have all done iSCSI natively for quite a while. Interestingly enough, the high-end DMX came out first, followed quickly by mid-tier CLARiiON and Celerra NAS. Didn’t exactly fly off the shelves as we might have hoped …
The second barrier – platform ecosystem support
Most people realize that storage products work in an ecosystem of HBAs, drivers, management tools, and so on. Shortly after the storage platform products became available, it became pretty clear that there was more work to do in the ecosystem.
Things like stable drivers that supported all the functionality found in FC, including important features like SAN boot (by the way, these drivers come from the OS vendors, and not EMC. We do the testing and qualification via eLab).
Or SRM tools like ControlCenter that would let you manage your iSCSI SAN the same way you managed your FC SAN.
But these pieces fell into place over time. Today, many popular OSes have a reasonable iSCSI driver. And there’s decent support for iSCSI in the EMC SRM products.
Still, iSCSI wasn’t flying off the shelves and into customer shops. What were we missing?
Services do matter
We hit our first real “aha” moment when customers started to approach us about designing larger iSCSI environments. We’re not talking about a couple of arrays and a dozen or so servers, we’re talking about people who wanted to get serious with iSCSI.
Turned out that the way you design an IP network for storage protocols (e.g. iSCSI) and how you might do it for other use cases is somewhat different. As an example, most data networking applications are OK with long latency, the occasional network congestion, or retrying patiently until you get a response.
Well, in the storage world, servers and applications aren’t too patient when they do a block-oriented transfer. They expect the storage to be there, and to be quick, otherwise they might throw a nasty error, like crashing. There’s no acceptable standard for acceptable storage response windows, so every environment ended up being a bit different.
Not only did we have to do a lot of internal testing and qualification, we had to bring two related disciplines to the table: a storage discipline, and an IP networking discipline. We could find lots of people who could do one of the two, but very few who could do both at the same time. And our customers had the same problem as well.
The answer turned out to be a new storage networking practice that blended storage requirements around predictability with IP networking technology. It’s a new and growing area for us, and I think it’ll be more important in the future.
OK, we had the expertise, but then we hit another hard reality.
Legacy investment matters more than you think
FC technology has been around for almost ten years now. Not only have customers invested many billions of dollars in equipment, they’ve probably invested a much larger sum in skills, processes and procedures to make it all work.
And, outside of a potential cost disadvantage, there’s nothing wrong with FC SANs. They work as advertised. And as the saying goes, “don’t fix it if it ain’t broke”.
Outside of a compelling external event, the natural (and logical) tendency is to buy more of what you already have, rather than biting the bullet and building out an entirely new technology stack.
One customer made it very clear to me though an analogy. “Yes, let’s assume that Macs are better. We’re just not going to swap out 10,000 PC users.”
It sounds pretty obvious in retrospect, but at the time the revelation was very sobering.
Despite the obvious economic advantages on the hardware side, the people side of the equation not only negated any economic impact, but pretty much ensured that people who had started with FC would largely stay with FC for the time being. We could give the stuff away, it wouldn’t matter.
And then there’s the political angle
I know this is going to come as a surprise to you, but sometimes various factions within IT organizations compete a bit. I know you’re shocked and horrified, but it has been known to happen from time to time.
As it stands today, the storage guys don’t have much shared turf with the network guys. The storage guys have their technology, the network guys have theirs.
Throwing iSCSI into the mix can re-open the whole discussion of “who controls the network”. After all, it’s a common technology set, why shouldn’t it be managed centrally?
Other than the usual control issues, there’s a more practical concern: do networking guys really understand how storage is different? My current take is that the answer is “no”, which provides some basis for keeping things separate, at least for the time being.
Put it all together, and it's pretty clear why iSCSI has stalled in larger enterprises. It's not an evil conspiracy by storage vendors. It's customer choice -- plain and simple.
So what’s ahead?
First, I think you’ll see more of the same: iSCSI in smaller, greenfield SAN builds where FC isn’t entrenched yet. But from small acorns mighty oak trees grow.
And as these environments grow, I think you’ll see more focus put on things like support, performance, management tools, etc. A part of the existing iSCSI market will grow into having enterprise-like concerns. At least we’ll be well prepared …
But is there a triggering event out there which will force larger enterprises to revisit their approach to storage networking?
One unpleasant candidate would be a massive reduction in IT budgets. The last time this happened, there was a lot of interest in iSCSI, but then spending levels recovered, and the discussion moved on.
There’s also a chance that the renewed focus on data center energy reductions will help in some way. Servers that use on-board ethernet use less energy than those that use an add-on FC card (or two!), besides being smaller (think very dense blade servers). Don’t know if the discussion will progress to that level, but it’s possible.
Going out a bit farther, I believe that further cost reductions in 10Gb ethernet technology will encourage people to take another look at an alternative to FC. Once it starts becoming standard on server motherboards, and port costs come down to near where 1Gb technology is today, the discussion will open up again.
Going out a bit farther, many vendors (including Cisco) are working on enhancements to 10Gb technology that create the potential for a converged approach to data center networking: one set of technologies that can do server-user networking, server-storage networking, and – increasingly – server-server networking for grid architectures.
Now, there’s a long gap between having the technology (which is not here yet) and having people use it, so it’s unclear to me whether the emergence of enhanced 10Gb technology that can be used for most everything in the data center will be enough to have larger enterprises start to move in this direction, but it’s definitely a potential.
Lessons learned …
So, I think I’ve learned a lot from watching this all unfold.
- I’ve learned that a product ecosystem is probably just as important as the product itself.
- I’ve learned that – without specific services – no market will move.
- I’ve learned that switching costs – in all their convoluted glory – can often outweigh any economic benefit of a new technology.
- I’ve learned that it’s better to take a new replacement technology and focus it on customers with unmet needs, rather than spend the effort to get happy customers to switch.
And, most of all, I’ve learned to never, never predict that “this is the year of iSCSI”.