We’ve been talking about it in the storage business for at least five years, maybe more.
On the surface, it’s got almost everything going for it: good performance, cost-effective, leverages existing network investments, strategic roadmap to 10Gb, and so on.
You’d think it’d be pervasive in large enterprise shops these days. But it’s not.
It’s been on the list of industry predictions every year since 2004.
This is the year of iSCSI. No, wait, this is the year. Really guys, it’s going to happen this year. And so on.
But that hasn’t exactly happened – and it’s not for the reasons most people think.
I think iSCSI serves as an interesting case example of how change really plays out in the IT industry, and that things may not be what they first appear. There are lessons to be learned all around – not only for customers, but for IT vendors as well.
Context
At first glance, iSCSI is pretty attractive. Not to oversimplify, but it's a protocol that enables the fundamental storage command set (SCSI) to work over an IP (ethernet) link.
All of us in the storage industry thought this was pretty cool.
Customers could save big money by avoiding the purchase of fibre-channel HBAs.
They could use lower-cost ethernet switches rather than their more pricey FC counterparts.
And there was the potential of a common technology and skill set to support both data networking and storage networking.
Certainly something worth pursuing.
And so we – and every other storage vendor – started to invest heavily in bringing iSCSI solutions to market. Let me take you through the journey of what we did, and what we learned.
The Red Herrings
There were some red herrings early on in the discussion that proved to be irrelevant.
At one time, the default assumption was that customers might need TOEs (TCP-IP Offload Engines) in their servers, which would mean new technology and higher costs. Not good.
Well, servers became faster than ever, and it soon became clear that there wasn’t going to be much need for a dedicated accelerator card to make iSCSI work well.
There’s also the perceived performance issue. Hey, iSCSI runs over 1Gb ethernet, and FC is either 2Gb or 4Gb, so doesn’t that make FC proportionally faster?
Well, not really. I joke that the electrons go just as fast over both.
On a more serious note, storage performance boils down to response time and transfer rates. Response time is roughly similar for both (sometimes iSCSI can be faster), and transfer rates are the same until you exceed the bandwidth of the pipe, at which time you’re either interested in a faster pipe (read FC) or an additional pipe (read iSCSI) and spreading the load.
As we matched up iSCSI performance against our rather extensive knowledge of real-world workloads, it became pretty clear that a significant majority of applications could run comfortably on iSCSI with no negative performance impact whatsoever.
And there’s been good traction – in some segments
If you segment the storage market into different pieces, you’ll find segments where iSCSI is popular and growing in strength.
As an example, if I’m interpreting the IDC numbers correctly, in the first three quarters of 2006, $369 million of iSCSI arrays were sold. EMC (including Dell OEM) had #1 share with 24.2%. The market grew at 96%, and we grew somewhat faster than the market.
But that pales in comparison with the roughly $6.5 billion of non-iSCSI based SAN storage sold during the same period, the vast majority of which is FC. Put differently, iSCSI is about 5% of the overall SAN storage array market.
So where are these iSCSI arrays going?
Mostly, places where there’s no FC installed, and it’s a greenfield choice for the customer. Smaller shops where IT skills (and budget) are limited are good for iSCSI. In large enterprises, non-core SAN deployments (maybe test/dev environments, or smaller sites) have some minimal pickup.
But I think it’s safe to say – iSCSI hasn’t cracked the bigtime: large enterprises that – in total – spend several billion dollars a year on FC-related technologies. And it doesn’t look like it’s going to change in the near future.
So why is that? And therein lies the story ...
The first barrier – platform support
If you’re going to do iSCSI, you’re going to want a storage array that does it natively. The first “solutions” involved converters between FC and iSCSI protocols. Yuch: cost, complexity, performance, and so on. No one really took those seriously.
But the product barrier soon fell. As an example, EMC storage products have all done iSCSI natively for quite a while. Interestingly enough, the high-end DMX came out first, followed quickly by mid-tier CLARiiON and Celerra NAS. Didn’t exactly fly off the shelves as we might have hoped …
The second barrier – platform ecosystem support
Most people realize that storage products work in an ecosystem of HBAs, drivers, management tools, and so on. Shortly after the storage platform products became available, it became pretty clear that there was more work to do in the ecosystem.
Things like stable drivers that supported all the functionality found in FC, including important features like SAN boot (by the way, these drivers come from the OS vendors, and not EMC. We do the testing and qualification via eLab).
Or SRM tools like ControlCenter that would let you manage your iSCSI SAN the same way you managed your FC SAN.
But these pieces fell into place over time. Today, many popular OSes have a reasonable iSCSI driver. And there’s decent support for iSCSI in the EMC SRM products.
Still, iSCSI wasn’t flying off the shelves and into customer shops. What were we missing?
Services do matter
We hit our first real “aha” moment when customers started to approach us about designing larger iSCSI environments. We’re not talking about a couple of arrays and a dozen or so servers, we’re talking about people who wanted to get serious with iSCSI.
Turned out that the way you design an IP network for storage protocols (e.g. iSCSI) and how you might do it for other use cases is somewhat different. As an example, most data networking applications are OK with long latency, the occasional network congestion, or retrying patiently until you get a response.
Well, in the storage world, servers and applications aren’t too patient when they do a block-oriented transfer. They expect the storage to be there, and to be quick, otherwise they might throw a nasty error, like crashing. There’s no acceptable standard for acceptable storage response windows, so every environment ended up being a bit different.
Not only did we have to do a lot of internal testing and qualification, we had to bring two related disciplines to the table: a storage discipline, and an IP networking discipline. We could find lots of people who could do one of the two, but very few who could do both at the same time. And our customers had the same problem as well.
The answer turned out to be a new storage networking practice that blended storage requirements around predictability with IP networking technology. It’s a new and growing area for us, and I think it’ll be more important in the future.
OK, we had the expertise, but then we hit another hard reality.
Legacy investment matters more than you think
FC technology has been around for almost ten years now. Not only have customers invested many billions of dollars in equipment, they’ve probably invested a much larger sum in skills, processes and procedures to make it all work.
And, outside of a potential cost disadvantage, there’s nothing wrong with FC SANs. They work as advertised. And as the saying goes, “don’t fix it if it ain’t broke”.
Outside of a compelling external event, the natural (and logical) tendency is to buy more of what you already have, rather than biting the bullet and building out an entirely new technology stack.
One customer made it very clear to me though an analogy. “Yes, let’s assume that Macs are better. We’re just not going to swap out 10,000 PC users.”
It sounds pretty obvious in retrospect, but at the time the revelation was very sobering.
Despite the obvious economic advantages on the hardware side, the people side of the equation not only negated any economic impact, but pretty much ensured that people who had started with FC would largely stay with FC for the time being. We could give the stuff away, it wouldn’t matter.
And then there’s the political angle
I know this is going to come as a surprise to you, but sometimes various factions within IT organizations compete a bit. I know you’re shocked and horrified, but it has been known to happen from time to time.
As it stands today, the storage guys don’t have much shared turf with the network guys. The storage guys have their technology, the network guys have theirs.
Throwing iSCSI into the mix can re-open the whole discussion of “who controls the network”. After all, it’s a common technology set, why shouldn’t it be managed centrally?
Other than the usual control issues, there’s a more practical concern: do networking guys really understand how storage is different? My current take is that the answer is “no”, which provides some basis for keeping things separate, at least for the time being.
Put it all together, and it's pretty clear why iSCSI has stalled in larger enterprises. It's not an evil conspiracy by storage vendors. It's customer choice -- plain and simple.
So what’s ahead?
First, I think you’ll see more of the same: iSCSI in smaller, greenfield SAN builds where FC isn’t entrenched yet. But from small acorns mighty oak trees grow.
And as these environments grow, I think you’ll see more focus put on things like support, performance, management tools, etc. A part of the existing iSCSI market will grow into having enterprise-like concerns. At least we’ll be well prepared …
But is there a triggering event out there which will force larger enterprises to revisit their approach to storage networking?
One unpleasant candidate would be a massive reduction in IT budgets. The last time this happened, there was a lot of interest in iSCSI, but then spending levels recovered, and the discussion moved on.
There’s also a chance that the renewed focus on data center energy reductions will help in some way. Servers that use on-board ethernet use less energy than those that use an add-on FC card (or two!), besides being smaller (think very dense blade servers). Don’t know if the discussion will progress to that level, but it’s possible.
Going out a bit farther, I believe that further cost reductions in 10Gb ethernet technology will encourage people to take another look at an alternative to FC. Once it starts becoming standard on server motherboards, and port costs come down to near where 1Gb technology is today, the discussion will open up again.
Going out a bit farther, many vendors (including Cisco) are working on enhancements to 10Gb technology that create the potential for a converged approach to data center networking: one set of technologies that can do server-user networking, server-storage networking, and – increasingly – server-server networking for grid architectures.
Now, there’s a long gap between having the technology (which is not here yet) and having people use it, so it’s unclear to me whether the emergence of enhanced 10Gb technology that can be used for most everything in the data center will be enough to have larger enterprises start to move in this direction, but it’s definitely a potential.
Lessons learned …
So, I think I’ve learned a lot from watching this all unfold.
- I’ve learned that a product ecosystem is probably just as important as the product itself.
- I’ve learned that – without specific services – no market will move.
- I’ve learned that switching costs – in all their convoluted glory – can often outweigh any economic benefit of a new technology.
- I’ve learned that it’s better to take a new replacement technology and focus it on customers with unmet needs, rather than spend the effort to get happy customers to switch.
And, most of all, I’ve learned to never, never predict that “this is the year of iSCSI”.
Chuck, great post. The bits that stick for me are; political - I see that issue of network v storage guys as a big thing; I've done some FCIP recently and getting a straight answer out of the network team isn't easy. On the hardware front, I agree power will be an issue, but I think more with the switch itself. I've posted previously on the Gb/s per watt the major vendors provide (Brocade was best). Virtualisation will go some way to reduce the argument of power consumption in servers. Most of all I think it is the change thing. The environments I work in have a massive aversion to change, the implication that things will stop working, work incorrectly and basically put the business at risk. That can be something as simple as moving to another NAS supplier....
Posted by: Chris M Evans | January 03, 2007 at 03:57 AM
Chuck, I disagree that "the year of iSCSI" hasn't come. I think it happened in 2004.
Details: http://blogs.netapp.com/dave/ThinkingOutLoud/2007/01/07/The-Year-of-iSCSI.html
Dave
Posted by: Dave Hitz | January 08, 2007 at 12:59 AM
I saw your post, Dave, and I see your point. The benchmark you chose was $100m of revenue, which is not entirely unreasonable.
And I can't argue that iSCSI is not a healthy market for EMC, NetApp and lots of other folks.
The benchmark I chose was different than yours -- I chose to focus on customer adoption rather than arbitrary revenue milestones. $100m against a backdrop of tens of billions doesn't represent a broad-based shift in customer thinking, IMHO.
I think it's fair to say that -- at one time -- many of us thought that people using FC would start to consider iSCSI as a serious alternative, and we'd see broader adoption.
That hasn't happened, and I thought it was an interesting lesson for many of us, now somewhat obvious in retrospect.
My other goal was to dispel the occasional notion that, somehow, major vendors like EMC were "holding back" on iSCSI. Actually, the opposite is true -- if anything, we've overinvested rather than underinvested.
Enjoy reading your blog, Dave!
Posted by: Chuck Hollis | January 08, 2007 at 08:28 AM
Dave,
I agree with many things you are stating in your blog. Specifically that several years it was incorrectly predicted for the next year that "This is the year of iSCSI". However, when we take a look at the following figures mentioned at the Storage Networking World about the number of iSCSI implementations:
Fall 2005: 4,500
Spring 2006: 12,000
Fall 2006: 20,000 (with predicted 30,000 by year end).
This shows clearly that the iSCSI adoption rate is increasing fast. However, from a market size point of view, iSCSI does not even come close to the FC market (yet).
Posted by: Wolfgang Singer | January 19, 2007 at 03:25 PM
I can't disagree that iSCSI is growing fast, or that there are many implementations.
The growth figures are off a very small base, and -- again -- today, around 5% of the SAN market is iSCSI. That's enough to sober even the most ardent advocate.
More importantly, these implementations aren't at the heart of corporate IT. Heck, my perception is that the topic isn't even up for discussion.
And that's the point I wanted to make.
BTW, this post kick-started a very vibrant debate. All good for the industry.
Thanks for the post ...
Posted by: Chuck Hollis | January 19, 2007 at 04:04 PM
Dave,
Regarding this market since the beginning, I think most of the time we miss one important point about the yes/no iscsi debate. If you looking at the iSCSI as a potential replacement for FC, I am completly agree with you. But I think that iSCSI put more on the table than just a shift in connectivity. If you lookat startup company, NetApp or EMC with the Celerra, the value proposition of these iSCSI solution is more around a virtualisation engine. You do not have anymore a direct link between the exported virtual Lun and the physical drives. The Lun takes their needed blocs from a pool of storage. The immediate benefit is simplicity, not in the GUI, but in the management of the array. The array takes care of the underlying optimisation of disks. As an analogy, MSSQL did take market share to Oracle because the product was better (and it was far below) but because the same IT guy was managing network, OS, Applications... and not anymore just a DBA. I thinks the same thing is happening to the storage and iSCSI is the way to do it. It is not the end of the FC, it is just that the next wave will not be FC.
Posted by: Christophe Baranger | January 23, 2007 at 10:16 AM
I'm not sure comparing iSCSI and FC revenues is the right comparison. There is no doubt FC Symmetricies drive more revenue than iSCSI Celerra. The same likely applies even in a Clariion environment. FC attached systems are likely richer configs than iSCSI.
FC is the preferred mission critical data center storage transport. iSCSI is popular with distributed clients, such as diskless PCs, distributed web-servers, etc. In some cases iSCSI has enabled a new capability (diskless PC boot), and in some cases it is used as an alternative to NAS.
I have also heard iSCSI is popular in Microsoft environments because network shares are not allowed for some applications (Exchange mail stores), but iSCSI allows Ethernet attached storage.
FCoE is not a panacea. It cannot be routed (unless a Ethernet FC router is introduced into the Ethernet network, which seems kludgy), so it cannot replace iSCSI which is typically used in a routed IP environment.
FCoE may require new NICs to support L2 reliability.
So iSCSI and FCoE will have to coexist. So much for converged networks.
If Ethernet becomes the grand unifying transport, it opens up three options for the datacenter: FCoE, iSCSI (including iSER), and networked file systems (including NFSoRDMA, pNFS, pNFSoRDMA, etc.).
FCoE is not innovation. It is the same thing repackaged. As was iSCSI. As is iSER.
NFSoRDMA is interesting, as it eliminates the big problem with NFS (CPU load).
pNFS and pNFSoRDMA seem truly game changing.
My guess is five years from now, all of us will look back and say "Ethernet took over, but things didn't turn out I predicted in 2007."
But for EMC, spinning rust will still be spinning rust five years from now, so you will win regardless of who wins the coming protocol wars.
Posted by: meh130 | April 23, 2007 at 06:09 PM
Hi, I just found this - quite randomly - while searching for benchmarks on FC-iSCSI routers.
I'll put down a quite personal opinion here and will try to reason it out, too.
"So why is that? And therein lies the story ..."
plain and simple:
It is because iSCSI sucks.
Disclaimer: I've been toying around with iSCSI for quite a few years now. I've seeked dark channels to get a hold of the Cisco branded software-initiators, I've done plain iSCSI from W2K to a NetAPP filer the moment M$ realeased the beta initiator, I've even had databases on it just for testing, and I've even gone through the madness of using it on HP-UX and AIX.
- Platform support issue: not true - Cisco had everything covered in 2003 already. Real fact: They stopped covering everything because noone was interested. Current software initiators work for about everything (Ok, FreeBSD might lack encryption, but as long as noone even notices seems no issue)
- Performance gap being irrelevant:
Not true. This IS what IT managers will care about. In real world datacentres people are happy to see 150MB/s coming out of dual 4gb links, and they absolutely EXCEPT an i/o overhead for this of no more than 2% cpu power.
What you propose them is about the following scenario: "Oh, given dual gigE links you'll easily see 60MB/s on certain transactions at a neglible cpu usage of roughly 25% unencrypted and just slightly reduced network bandwidth"
Numbers:
one of your CPUs + service contract easily runs $10000+ - suddenly waste 10-25% of that for overhead?
you, in fact, need 2 additional gigE lans to match the reliability of your current SAN
you need to use extremely highend-specced lan switches to REALLY match the reliability (think ISL == dual-trunked cross-switch VLANs) Those switches cost a lot more than a few cheap brocades / ciscos.
- "As we matched up iSCSI performance against our rather extensive knowledge of real-world workloads, it became pretty clear that a significant majority of applications could run comfortably on iSCSI with no negative performance impact whatsoever."
shouldn't the lower layers of an infrastructure match up to potential workloads? or should the customer rather add an FC adapter and remap the lun to FC once peak loads occur? Like a database backup that'd take just 50% longer? Or do we suggest the DMX disk mappings are made in a way that concurrent IO will the bandwidth under our 'threshold'?
Then comes security - encryption? nice thing. just a slight CPU overhead for those 65MB/s without an offloading HBA. if you buy the HBA the cost benefit starts wearing off. Same if you build separate lans, not mentioning Zoning is slighlty less complicated than using 1000-2000 VLANs for those few hosts you have.
- Lack of integrated enterprise level tools. Large environments won't be straightforward to manage, meaning HUGE administrative costs. not just training, but constant overhead. No issue for toying around, making showcases and customer presentations. But will you be liable for managing a few thousand iSCSI luns using selfmade scripts or the filer GUI? (sorry Dave, but I didn't cancel DataFabric Manager :p)
- Hell, without an dedicated HBA you can't even use it to boot a system reliably, and even if you have a dedicated HBA there's no multipathing, etc, which equals a lot of tiny SPOFs in consolidated environments.
- the 10ge argument:
People have run iSCSI over 10gbit ethernet, and they cheered as they reached around 600MB/s sustained throughput.
The funny thing is, a friend of mine used to do movie postproductions, those 600MB/s are just slightly what a 1997 SGI Octane could push through the FC HBAs they had in it.
1. iSCSI might come to better reputation once the performance catches up to post-2000 numbers
2. working encryption, boot, multipathing working on enterprise scale might help. think scsi reservations, automatic handling of 1000s of encryption sessions, certificates, etc. That's when it works - because that is what the enterprises already *got*.
3. Any actual new features might help, too.
Florian
P.S.: Thanks a lot it at least includes iSNS.
Posted by: Florian Heigl | July 11, 2007 at 12:18 AM
Hi Florian, and thanks for the commentary.
Have you had a chance to look at FCoE and offer a perspective?
Thanks
Posted by: Chuck Hollis | July 12, 2007 at 01:42 AM
I was wondering that too. Thanks for the insight.
Posted by: Ethernet Over Copper | October 26, 2010 at 04:11 PM