So, it’s no surprise that everyone is doing more with VMware in data centers these days.
Between hardware reduction, power savings, increased flexibility, etc. it is an article of faith for me that this will accelerate during 2007. It’s just too darn compelling.
So what’s holding customers back from deploying VMware most everywhere?
Several issues, but one important one is infrastructure: to get the most out of server virtualization, we believe the infrastructure around it has to change.
In this post, I’ll take a look at where we see a few important issues arising in more advanced VMware infrastructure, and – naturally – what EMC is doing about it.
Context
First, some background.
Even though VMware is owned by EMC, you’d never know it. For good reasons, it’s run as a completely separate company. This is not PR whitewash. During 2006, I’d offer that IBM, HP et. al. had better relationships with VMware in many regards than did EMC.
As I’ve mentioned before, my day job at EMC is running technology alliances, and – not surprisingly – we treat VMware as an independent alliance partner the same way we treat Microsoft, Oracle, SAP, Cisco, RedHat, Novell, et. al.
The first time people hear me say this, they think we’re nuts, but when I explain the logic, they can at least understand the rationale.
This is good because we believe an operating environment like VMware needs to be free-standing and independent to meet customer needs and be successful. This also has a bit of downside for EMC customers that would like to have “one face” to EMC, which we can’t do in this regard.
Specifically, EMC doesn’t sell VMware. VMware sells VMware.
This independence goes both ways. Customers don’t want one choice to dictate another.
So EMC can’t limit itself to one and only one server virtualization technology. We’re actively working with Xen, Sun’s Zones, IBM’s LPARs, Microsoft’s eventual offering, etc. etc. etc.
That being said, we’re doing the majority of our integration work with VMware, because – obviously – far and away, that’s what people are using the most.
And we’re learning important lessons here that apply to other server-virtualized environments.
The Core Infrastructure Challenge
Simply put, the central infrastructure challenge is that server virtualization adds another layer in the stack. As an example, instead of server / network / storage, it’s now virtual server / physical server / network / storage.
If the VMware environment isn’t very large, or business critical, this doesn’t present much of an issue. But, as more and more workload gets virtualized, and the environment gets even larger, people are starting to bump into infrastructure issues.
I don’t think that this is VMware’s problem to entirely solve -- they’re not in the infrastructure business. It’s up to larger IT vendors like EMC to go figure this out.
Challenge #1 – Flat Name Space for VMotion
One of the most powerful and sexy features in VMware ESX 3.0 is the advanced capabilities of VMotion, managed by DRS. The ability to dynamically (and automatically!) move and optimize workloads across a pool of servers is nothing short of pure magic, in my book.
But this presents a new challenge to the storage infrastructure. You’re going to want the ability for every virtual server image to be able to see every storage object from every server.
One way of describing this is a “flat name space”. If I’m VM image #32, I want to be able to see my stuff (/vmimages/vmimage32) from each and every server in the complex. That way, if VMotion wants to move my container around, I can pick up and run without having to figure out where my storage is.
Now, it's clear that most VMware customers use SAN technology as they move to more advanced VMotion and DRS implementations. One implication is that you'll probably have to expose every LUN to every server, something that might cause a few of us concern.
I think that, in the long term, we'll find high-end NAS much more friendly for high-end VMotion / DRS farms than today's SANs. And I think that NAS has the potential to offer a few benefits that we might not find in the SAN world.
Not only does NAS deliver a flat name space that will probably make implementing VMotion far easier, you get other potential benefits that might not be obvious:
• You get to manage a file system, rather than a collection of LUNs
• You get some modicum of access control through the file system mechanisms
• You get access to advanced NAS features, like thin provisioning, snaps, replication, etc.
• You get access to additional ILM, SRM and dynaimc storage tiering capabilities
And, as a special added bonus, you get to use low-cost ethernet to connect your servers to your storage. Very nice, especially if you’re looking at blades or high-density racks.
Now, if you’re enamored with this idea, let’s go out a bit further.
You should be thinking about a NAS device that is very scalable, robust and beefy, and probably not the stuff you use for ordinary file serving.
Consolidated VMware servers can drive very dense I/O profiles that are very different than your garden-variety file serving, not to mention that you’ll most likely want things like clustered failover, non-disruptive upgrades to hardware and software, and so on.
And, thinking out even further, you’ll probably want the ability – over time -- to gang together multiple NAS devices, present it as a single flat name space, and have the ability to move things around behind the scenes non-disruptively – file virtualization. Think migrating, tiering, archiving, replication, etc.
In the EMC portfolio, these products are EMC’s Celerra for the high-performance NAS gateway (scales up to 8 blades), behind that there’s EMC DMX, CX and Centera for tiering, Rainfinity for file virtualization, and then a whole plethora of data protection, ILM and SRM tools.
All ready to go in VMware ESX 3.0 and VMotion environments.
Now, this is not to say that we won't see a lot of SAN in tomorrow's VMware farms. But it's an interesting exercise to think through each, and weight the pros and cons.
Challenge #2 – Storage Resource Management
Many larger shops have already set up enterprise SRM for all of their servers and storage. How does VI3 fit in here? Yes, VI3 delivers some capabilities within its own VMware world, but what about everything else?
The starting point for enterprise-class SRM is discovery and visualization. What do I have, how does it connect, and how is it all related?
As an example, the primary user interface in EMC’s ControlCenter is a graphical depiction of the server and storage environment, from which the administrator can pivot, drill down, and so on.
It’s important to note that none of the lower-level capabilities (e.g. utilization reporting, performance monitoring, provisioning, etc.) aren’t very effective unless you’ve got this end-to-end view.
To do this, the SRM environment not only needs to discover the entities in the environment, but be able to correlate them logically, e.g. this application uses this server uses this HBA which is connected to this port which in turn connects to this set of LUNs which happen to live in this array and these particular disk spindles. Whew!
Now, insert server virtualization into this stack.
What happens? It breaks the connection. Maybe I can see the virtual machines. Or maybe I can see the VMware ESX servers. But, unless some heavy lifting is done, I won’t be able to see that stem-to-stern view that makes enterprise SRM useful. And you can’t manage what you can’t see.
At the end of 2005, we recognized this as an important issue, so our ControlCenter team began to work on the non-trivial enhancements needed to make SRM work in a virtual server environment. The key release is the next version of ControlCenter, which should be generally available in before too long, but is available for pre-release sooner.
Is this a big deal? Only if you want enterprise SRM for your VMware servers alongside everything else you own. Otherwise, you can do it piecemeal.
Challenge #3 – Backup and Recovery
Backup and recovery – never a pleasant topic in the physical server world – gets even more thorny and problematic in a virtual server world.
Why is this? Well, for one thing, you want the ability to backup individual virtual machines (and their respective virtual file systems) in addition to backing up entire ESX images and their respective files or disks.
Not only that, but there’s a ton of duplicated data sitting around in VMware environments. Lots of replicated copies of binaries, guest OSes and so on.
A related consideration is bandwidth constraints -- traditional backup/recovery models drive a ton of bandwidth during the backup window, which is usually far more than is experienced during normal operations. You don’t want backup bandwidth forcing you from, say low-cost ethernet to more expensive fibre channel.
If you’re thinking remote replication and DR, that has a few thorny issues as well. Doing replication in the storage array (or storage virtualization device) copies the entire LUN, which might have several VMFS instances that all have to be copied as a unit.
If you opt for doing replication at the individual virtual machine level, or in the ESX console (maybe VCB), there are other tradeoffs, most notably complexity. That might be manageable, but add DRS (moving around VM instances dynamically) and your head will start to hurt after a while.
So what’s the answer? Yes, you can use traditional backup products like Networker et. al. to do this, but I think that there’s a better answer that’s well-timed with the shift to virtual infrastructure.
And that’s data-deduplication.
One of the reasons that we acquired Avamar was their “global single instance” data reduction. The client software knows what chunks already exist on the backup farm, and only sends what’s unique. So, if you have several dozen Windows instances (or Linux instances) running in your VMware farm, we’re talking some pretty major data reduction.
Another great benefit, besides using far less storage, is the fact that there’s a ton of less bandwidth needed for data de-dupe backups. Less bandwidth => better service levels for production => shorter backup windows => more cost-effective storage networking, and so on. It’s the gift that keeps on giving.
And, finally, there’s that neat feature where Avamar presents its backup images as read-only filesystems that can be mounted, copied, etc. which, in this case, are individual VMFS containers, which is probably the granularity you’ll want.
There’s more to the story here, but I think you get the picture. And, without client-side data-deduplication and a global knowledge of existing chunks, it wouldn’t be near as useful.
So, to net it all out, as long as you’re thinking about your next VMware farm, why not think about data-deduplication at the same time?
It’ll be cheaper than tape – and I’m not talking made-up marketing numbers. You’ll spend less money on hardware (tapes, servers, HBAs, etc.) to get the job done – plain and simple.
Challenge #4 – Managing End-To-End Service Delivery
I’ve made the case before that we don’t live in a world anymore where one user uses one application. What the user sees is a logical combination of application services that run on an increasingly complex IT infrastructure stack. And IT finds it harder and harder to drive back to a root cause when there’s a performance or outage that users are noticing.
VMware didn’t cause this problem, but it doesn’t make it any better. It adds yet another layer that has to be discovered and correlated in the stack. Yes, there are decent tools for this sort of thing within the VMware environment, but that’s only a piece of the puzzle.
I think – later on – people will want the new generation of model-based management tools that automatically discover and correlate physical and logical infrastructure, and can provide real-time root cause analysis when services are impacted.
In the EMC portfolio, that’s Smarts, which offers pretty decent support for VMware today, but also the networks, storage, and non-VMware environments around it.
Putting It All Together
I apologize for the longish post, but there’s a fair amount to talk about, and I’ve only scratched the surface. It’s a long journey, and we’ve only just begun.
If you’re the average IT shop, you’re probably trying to figure out how to do more with VMware. But, I think you’ll agree with my premise that VMware at scale creates the need to think about a few things differently.
I hope to be writing more about this topic in the future – let me know if you think this discussion is worthwhile.
Moving from physical to virtual servers without careful re-architecting of the environment gives rise to the issues raised in the article and more. The key thing with server virtualisation is that the actual virtual servers should be considered as a resource and rendered anonymous. It's useful to design the environment in a manner similar to clusters, where we talk of "packages" which may be seen to be comparable to "services" in the SOA world view. The next step is to seperate the OS block device from the application and database block devices or filesystems. E.g. it might be optimal to store the application 's program files on NAS and the database on SAN. With EMC Celerra front-ending as NAS we can then define all the constituent volumes as a single device group and then perform TF, SRDF, etc operations on that set without the complications of considering the servers (virtual or otherwise).
Posted by: Eugene | January 21, 2007 at 07:11 PM
I want the ability to copy individual exclusive devices and their specific exclusive computer file systems in addition to support up entire pictures and their specific information.
Posted by: מרכזיה | April 03, 2012 at 03:50 PM
I like the next phase is to separate the OS prevent system from the program and data source prevent gadgets or file systems.
Posted by: מרכזיה | April 03, 2012 at 03:52 PM