Now that FCoE and data center ethernet are a standard fixture in the strategy discussion, it's bringing up some interesting organizational questions.
In a converged fabric world, who gets to run the fabric?
And it's turning out to be a more complex discussion than I first thought.
A Bit Of Background
If we look inside data centers today, we'll usually find three distinct classes of networks, and (usually) three different groups responsible.
The storage guys run the server-to-storage network, using a mix of FC and IP technologies. They design, implement, manage and evolve this particular part of the landscape. Put simply, they "own" it as part of their mission.
The server guys will often build their own networks for server-to-server communications -- think clustering, failover, etc. They too design, implement, manage and evolve this part of the landscape -- it's part of their mission to provide server resources.
And, finally, we have the traditional network guys who provide connectivity from applications and servers to the outside world. And they too design, implement, manage and involve this investment as well.
Three different styles of networks. Three different missions. And three separate and distinct ownership points.
Let's Throw In A Converged Fabric, Shall We?
Call it data center ethernet or whatever, the idea is pretty much the same: take a fast ethernet transport (10g, or maybe 40g in the future), and "channelize" it so that different flavors of traffic can run over the same wire, blissfully unaware of the other traffic.
This channel gets used for storage connectivity. This other channel gets used for server-to-server connectivity. And this other channel gets used for access to the outside world. Nice, clean management domains on top of a common transport. And hard isolation exists to keep them from stepping on each other.
Server connectivity gets reduced to a pair of ethernet ports rather than the spaghetti factory we see today -- not only is it cheaper and more dense, power consumption and cooling is reduced, and there's a lot less to manage.
Data center connectivity might be as simple as "wire once and walk away" -- most reconfigurations of topology and function will be done logically, not physically. Switches and directors get denser and more cost effective. The more you think about it, the more you like it what might be possible.
That world isn't entirely here yet, but you can look out a bit and see it out on the not-too-distant horizon. And many of the customers I talk to are planning for that world when it gets here.
So who gets to run this converged fabric?
The Network Guys?
On first blush, most people would say "well, since it's ethernet, the network guys should run the converged fabric". But this perspective focused only on the underlying technology, and not the use cases.
As an example, storage people usually have a pretty good idea of how their storage networks get used. They think about things like latency and bandwidth. They think about things like multipathing. They know that the server-to-storage relationship isn't necessarily one-to-one. They know that things can get changed around a lot. And they know just how much a storage networking problem can impact the business.
As another example, server people have very specific needs when doing server-to-server connectivity. They're thinking in terms of clustering. Latency can be very important, rather than just bandwidth.
And for both teams, retries on the network are not usually a good thing.
So, in this converged fabric world, how do we ensure that a shared network meets the needs of particular use cases -- in this case, storage networks and server-to-server networks?
The Anecdotal History Isn't Good
I don't want to generalize here, but as I think back over the years, I can remember many examples of when network worlds collided, with all sorts of friction as a result.
I've seen frustrating exercises result when the storage team sits down with the networking team to discuss remote replication, as an example. The two teams think in terms of different concepts and priorities. Finding people who are comfortable in both worlds is difficult.
As part of EMC's enterprise NAS business, we get involved in all sorts of situations where a "NAS problem" turns out to be a data networking problem. From the data networking team's perspective, all is acceptable, but since that same network is being used to access storage, demands are different.
And bringing the two teams together can be an exercise in itself.
I'm not as close to the intersection between server teams and networking teams. Maybe they don't need to talk? All I can offer is a few anecdotal stories about server teams' choice of Infiniband causing all sorts of consternation from the data networking team.
But, as we see larger and larger computing clusters (think large VMware farms, for example) I'm sure these paths will cross more often in the future.
Simply turning things over to the data networking teams isn't going to be practical for many larger organizations. Sure, there will be exceptions to the trend, but I think there will be substantial resistance to a single converged fabric being run entirely by a single data networking team.
So, What Might Be The Answer?
I think we'll see a layered approach. The data networking teams will be responsible for the underlying transport, and assigned "channels" of the required connectivity, bandwidth, latency and resiliency -- based on input from the "specialty" (server and storage) teams.
Once those lanes have been carved out, I think it's best that the appropriate team manages their own resources, using their own tools. Storage people want to use storage tools to manage storage. Server people want to use server tools to manage servers.
And, if it's done right, no one will step on anyone.
Drilling down a bit on how large storage environments are done today, there's a certain similarity. What I see being done is that the storage team will come up with a design, or a modification, and hand it off to a separate team to actually do the physical changes, presumably in an FC world.
That same workflow is still valid -- the storage design team comes up with a design or modification, they now just hand it off to the data networking guys to create the desired (logical) topology with the desired characteristics.
Sure, it'll be bumpy at first -- isn't all new stuff? -- but I can see this working out pretty well before too long.
Until it doesn't ...
Converged Fabric May Give Way To Converged Management
Going out a bit farther, there's a gleam in the industry about what new management models might be possible (and very attractive) in a converged fabric world.
Let's talk about something as simple as provisioning a new application. The server guys configure resources and load a template. The storage guys allocate space and configure the required data protection. And the data networking guys configure connectivity to the outside world.
What happens when all of this is hanging off essentially the same fabric?
More than a few people think there's going to be a powerful motivation for a new flavor converged management: for example, one tool that can allocate (virtual) server resources, load up a template, allocate storage capacity and protection, and make all the connectivity work between the players.
Put differently, if everything is essentially a network resource, will it make sense to manage everything as a network resource? Go beyond provisioning and consider service delivery management, security, capacity planning, etc.
That's an intriguing possibility, isn't it?
But in terms of organizational responsibility, we've got an entirely new construct, don't we? I mean, today we've got separate disciplines and largely linear workflows between the groups. What happens when we can put it all on one console?
And, even if we can do it, will people want it?
Courteous comments always welcome ...
It gets worse. With enterprise networks it's always important to differentiate between LAN and WAN, because WANs typically require resources provided by third parties. Let's assume you want to synchronously, remotely replicate a terabyte of production data. Obviously this requires creating a WAN on a leased line. If the WAN isn't implimented perfectly, end user access to data can be impacted pretty severely. So the owner of the WAN connection is;
A) The storage guys because it's their job to protect critical data and to provide access to data. They can't guaranty either if they don't control the connection.
B) The network guys because it's their job to contract with third parties to provision lines and also to insure that they deliver a guaranteed quality of service to their users (in this case, the storage guys).
Posted by: Bill Bonin | September 09, 2008 at 12:12 PM
Great post - a couple of additional considerations:
1) Very few customers have 10Gb Ethernet today other than possibly at the backbone or a backup application. For converged fabric success, this transition needs to happen without disrupting the management processes (let's walk before we run).
2) The enhancements to Ethernet (I like "Lossless Ethernet" as a generic term) will all be new to network guys and will being dictated by the storage guys - so we'll need to get past the "silo" mentality for planning and maintenance even if each administrator has their own access and tools.
3) A shared environment will have security implications that today each group (networking, server, and storage) deals with individually.
The FCoE initiative shows great promise for reaching a converged fabric, but does not address WAN/LAN as Bill points out or any of the ultra-low latency (RDMA) needs where InfiniBand is used today.
Posted by: Stuart Miniman | September 09, 2008 at 01:25 PM
Nice post, Chuck. The arguments that David Isenberg made in his article "The Rise of the Stupid Network" (May 1997) still hold today. One of the tenets of the stupid network is "where transport is guided by the needs of the data, not the design assumptions of the network" - in other words, the "network tail" doesn't wag the "data dog".
Posted by: marc farley | September 10, 2008 at 10:36 AM
Excellent post Chuck. This is going to be an interesting time in many data centers. At a previous position I ran a proof of concept that tested VMware on a consolidated Infiniband fabric. The results showed there is great opportunity in a solution like this. Much faster I/O without the complex configuration and design of current aggregation methods.
So far the integration of storage and networking has been messy at best. The network teams have had to engineer their way around inherit weaknesses in Ethernet to get the I/O that some applications require. I know in my last organization, a large financial institution, they still do not offer iSCSI in the data centers. The teams just don't work well together.
Your point about managing this new fabric is also very relevant. The biggest hesitation I've seen from the network side of the house to integrate in to a new fabric is the lack of true management and monitoring. They still fight to properly manage the data network and now have to almost start from scratch on these new consolidated fabrics.
Posted by: Jason Nash | September 12, 2008 at 05:51 PM
well… i visit your website first time and found this site very usefull and intresting !
well… you guys doing nice work and i just want to say that keep rocking and keep it up !!!!
Regards
Lilly
Posted by: Lilly | December 05, 2008 at 01:40 AM