Even though I work for EMC, I’ve always thought of VMware as a separate company. We’ve always used the same alliance model that we use with Microsoft, Cisco, Oracle, SAP, et. al.
Given the recent IPO news from EMC, I guess that was the right approach.
We’ve been very busy behind the scenes preparing our portfolio for advanced VMware users. Most of the pieces have fallen into place, so I think it’s time to start painting the picture.
I’ve written about infrastructure issues for VMware ESX 3.0 (or VMware Virtual Infrastructure) before. But there’s more to say.
So, in the spirit of my Microsoft and SAP post, here’s the EMC landscape for VMware. There’s a lot to digest, so I hope you’ll patiently wade through it with me.
VMware Pegs The IT Buzz-Meter
Obvious enough, no?
There’s so much working for it. The need to consolidate servers to save money and save energy. The need to create a more dynamic, flexible environment that’s easier to manage and more responsive to changes. The fact that it’s both tactical and strategic. The fact that it just plain works as advertised.
Not many things come along in the IT landscape that change the game so profoundly and so rapidly.
And that’s just VMware on a server, let alone the desktop proposition, or the thin client proposition – it just keeps getting better and better.
But EMC realized early on that to fully exploit VMware’s capabilities, the infrastructure around it would need to adapt and change.
A virtualized server is a different sort of entity than a physical server. Just because something works in a physical world doesn’t necessarily mean that it works (or exploits!) a virtualized environment.
Now, when you’re doing test and dev, maybe this doesn’t matter much. Or if you’re running support applications, maybe it isn’t a big concern.
But during 2006, we started to encounter more and more customers who were very intent on pushing VMware just about as far as they could in their environments.
And they started to ask some very hard questions about how things work in a scaled-up virtualized environment.
Storage and storage networking
Backup and recovery
Disaster recovery and business continuity
Storage resource management
Service delivery management
Specialized skills that might be needed
Now, understand, many VMware shops may never raise an eyebrow over any of these issues. And they’ll never have to read the book on any of these issues.
But, if you’re one of those who has a gleam in their eye about doing much more with VMware at some point, you may find this somewhat useful.
Let’s start with the basics
Storage and Storage Networking
On the face of it, there’s nothing special about storage in a VMware environment, unless you’re serious about the size and scale of your deployment.
VMware, like most operating environments, requires extensive qualification and interoperability work in any SAN, NAS or iSCSI environment.
You want to make sure that your particular combination of server, HBA, SAN, array, etc. has been qualified, is fully supported, and all the gotchas have been identified.
EMC’s eLab traditionally does this work for other server environments, and VMware is no exception. Take a look at EMC’s eLab Navigator, and you’ll find an exhausting suite of tested and certified combinations that all are fully supported by EMC. More get added regularly.
It’s a big deal for us.
Of course, VMware has their own qualification matrix. I wish the two would line up perfectly, but there’s always some sort of gap where we’ve tested something, and it hasn’t shown up on VMware’s matrix yet. But it’s getting better all the time.
And, if you look closely, you’ll probably see that not everyone goes the extra step and qualifies the advanced features like Vmotion, or DRS, or HA, or interesting combinations.
EMC does.
Then there’s the question of storage network design. And the tradeoffs can be complicated.
First, on the server side, there’s an emphasis towards dense, cost-effective packages for VMware farms, either blades or rack servers. And a decision to use, say FC, instead of NAS or iSCSI can be a significant cost consideration at scale.
Having someone who can characterize your environment and make a recommendation might be useful. EMC can do that.
On a more esoteric front, there have been a few customers who are toying with the idea of creating service tiers within a DRS environment. This probably entails a few big beefy servers to handle the demanding tasks when they come up, a larger population of more cost-effective servers, and setting up DRS to move things appropriately on-the fly.
Well, the same thinking would probably carry through to storage. Multiple FC channels to the beefy servers, maybe iSCSI or NAS for the other ones.
The challenge in the future will be to synchronize DRS’s policy calls (move!) with storage capabilities to do the same between high performance and more moderate performance storage devices. Not there yet, to be sure.
Backup and Recovery
Backup and recovery seems to be a popular topic to open up for a new discussion when considering more demanding VMware environments.
One thread is managing the tradeoffs. As an example, if you load a sophisticated backup client in every VM, you’ll be using a lot of memory and processor resources, but you’ll have the ability to have per-VM granularity.
Or, the more efficient approach might be to use something like VCB or volume-oriented backup. More efficient, but in some cases you may lose the granularity of easily being able to recover portions of individual virtual machines.
Most of us at EMC think this is a great time to consider a third approach – data deduplication.
EMC Avamar specifically.
Why?
It just works well on so many levels. First, we’ve noticed that VMware environments have a preponderance of duplicated data, specifically programs and binaries. Avamar exploits this well.
In most environments, the dedupe is so efficient that very little time is spent moving data outside of the virtual machine. Not only do you get dramatically shorter backup windows, but in many cases we’ve noticed that customers can opt for more cost-effective servers and storage networks (e.g. iSCSI or NAS instead of FC).
Finally, EMC Avamar presents its backup images as native file systems, which – in this case – are individual virtual machines, time-sequenced by backup time: Monday, Tuesday, Wednesday and so on. That makes granular restore pretty easy and flexible. Just copy your backup image to the proper location (or a portion of it), and you’re off to the races.
Besides the intellectual argument for data dedupe, what I suspect is that many IT thinkers just can’t bring themselves to implement a classic backup model on next-gen server environments.
Whether it’s a traditional backup method (e.g. backup clients in the virtual machine), using VCB capabilities, or going full data dedupe a-la-Avamar, EMC can present the options and tradeoffs to help people make an informed decision.
Business Continuity and Disaster Recovery
If it’s important, it’ll probably be a candidate for remote replication at some point. And once again, the use of virtual machines creates a few new wrinkles for people to think about.
The most powerful and popular remote replication products today run in storage arrays. I’m talking about things like SRDF and MirrorView. But with this, there’s a challenge – the storage array has no concept of virtual machines.
Customers may find themselves being told they have to replicate the entire ESX image, rather than individual virtual machines. Not optimal.
Now, VMware recognized this issue a while back, and created a raw device model that gets around this particular issue, but in the process, prevents you from using a bunch of cool features in VMware. Not optimal.
There’s another school of thought that promote doing remote replication in server. Yes, this gets you per-virtual-machine granularity, but there’s another problem.
I think this is workable for very small numbers of servers, but breaks down in more serious VMware environments. I don’t think anyone wants to be responsible for dozens (or hundreds!) of individual VMware replication sessions. Not optimal.
What you’d ideally like to have is:
- Choice of per-virtual-machine granularity, or aggregates of virtual machines
- Choice of replication model: sync, async, point-in-time, continuous data protection (CDP)
- The ability to use external infrastructure to do the heavy replication lifting, moving it off of the individual servers
- The ability to use one technology infrastructure to replication both virtualized and non-virtualized servers
- And of course, be agnostic as possible with choice of array or server, be able to scale up, have a proven and well supported solution, and so on.
The exciting news is that’s exactly what’s shaping up with EMC RecoverPoint.
Here’s what I like.
- You can run the data splitter at either the switch level (today) or within a virtual machine (soon).
- All the heavy data movement and management happens outside the server.
- It offers all the traditional replication modes, plus a few new useful ones.
- It’s reasonably storage agnostic.
- And it can be used with both virtualized and non-virtualized environments simultaneously.
Bottom line – even though EMC has a whole slew of remote replication technologies, it looks like RecoverPoint is going to be a winner in large VMware shops.
Storage Resource Management
Well, once again, the virtual server environment presents some problems for SRM products. Not an issue to the casual user; more concerning to someone who’s thinking big with VMware.
One of the key capabilities everyone wants out of an SRM environments is end-to-end discovery and visualization. Show me how this application connects to that HBA through this port to that array and this LUN and so on.
Well, not surprisingly, most SRM products don’t know about virtual servers.
Except one, that is. The forthcoming release of EMC ControlCenter 6.0, specifically.
If you look a little farther in the roadmap, you’ll see storage utilization tools that are virtual-machine-aware, provisioning that’s virtual-machine aware and so on.
Not everything in the first release, but it’s coming.
A related discussion is storage array control from either a CLI or API. Lots of EMC customers control things like replication or volume management from scripts or programs. You’ll want to make sure that facility is still available from within a virtual machine, or a virtual console.
Finally, if you see yourself doing a lot with local replicas (cloning and such), you’ll probably want to make sure that whatever tool you’re using for replication management. EMC’s Replication Manager has an increasing set of capabilities that can be used with (and from) virtual machines.
Service Level Delivery
So by now, you’re probably starting to see a theme. Traditional tools don’t work so well in an advanced VMware environment, creating the need (or opportunity?) to look at newer tools suited to the job at hand.
Service level delivery is no exception to this theme. And it seems that the vast array of traditional tools that people use (enterprise management frameworks, server and application management tools, and so on) are all struggling to understand the nuances of a virtual server environment.
EMC has more work to do here, but we’re making some significant progress. One key capability is Application Discovery Manager (formerly nLayers).
The name is a bit of a misnomer – what it does is do hi-def, near-real-time discovery of the entire infrastructure: applications, relationships, servers, storage, networks, et. al. It can populate a CMDB with what it finds, show you changes over time, and does it without agents. Very cool.
My belief is that this real-time discovery capability is very useful in non-virtualized environments, but becomes even more interesting (necessary?) in large-scale virtualized environments where there’s yet another layer that has to be mapped and understood.
The same general argument can be applied to real-time root cause analysis via the Smarts offering. The ability to use a discovered model to correlate all the various alarms and alerts into something that’s actionable sooner than later is extremely valuable in traditional environments ; the ability to do this in environments where there is yet another layer to understand will be even more interesting.
Not all the product work is done yet by EMC, but I’d argue that retrofitting a 1990’s style management model on next-gen virtual server environments won’t sit well with most IT architects.
Services
Any time there’s a new kind of environment to be supported, there’s a need for new services.
There’s a need for professional, consulting-type services to help people assess, design and implement their new environments. And there’s a need for back-end customer support services that get to problem resolution quickly, without a lot of finger-pointing.
I’m pleased to say that I believe that EMC is ahead of the game here in both regards.
Not only do we have people who’ve done consulting in VMware environments, but we’ve got more and more solution blueprints being developed that can show people how to assemble the pieces quickly and get predictable results.
On the back-end, our “guilty until proven innocent” support culture is proving to be helpful as well. Very often, advanced VMware users are on the bleeding edge of making things work well together. Our habit of taking ownership of complex problems until they’re resolved has shown to be very helpful in the larger implementations as well.
Putting It All Together
EMC thinks (and so do I!) that people wanting to get the most out of VMware in advanced deployments will want to take a hard look at the supporting infrastructure.
- They’ll want to think about the storage and its network differently.
- They’ll want to consider how best to do backup and recovery.
- They’ll want to understand the pros and cons of different business continuity approaches.
- They’ll want storage resource management that works well with virtualized and non-virtualized environments.
- They’ll want advanced tools that help them discover their IT infrastructure, and use that knowledge to quickly resolve service deliver issues.
- And they’ll want to work with a vendor who can help them be successful with VMware.
I think EMC has done a good job in putting the pieces together. Clearly, there’s more work to do, but I look at where we are and can say that we’ve got an excellent start with more coming.
As I look out over the next few years, I also think that the value-added integration we’ve had to do with VMware will also be valuable as other virtualized server implementations reach the market and may be considered worthy of deployment.
One thing’s for certain: tomorrow’s server will be a virtual one, and there’s no putting the genie back in the bottle.
Chuck,
Well written post on VMware. It is great that EMC recognizes the greater value Avamar offers to VMware unlike popular media who thought Legato will be major beneficiary.
IMO, Avamar de-dupe will take VMware to next level in deployment - Eliminating duplicate data from VMs on same PM, reducing data transfer inter-VM, inter-PM, between VMs and storage, memory capacity increase to VMs, VM memory load balancing.
I am sure both EMC and VMware are looking at numerous other ways to incorporate de-dupe. What about other EMC units? How do you see Avamar technology showing up in other products?
Just wonder, how Avamar transaction will play out after VMware spin-off financially?
Anil
Posted by: Anil Gupta | March 01, 2007 at 12:33 AM
Hi Anil
I'm glad you see the connection between client-side de-dupe and server virtualized environments. It kind of hit us over the head like a ton of bricks.
Over time, I think you'll see dedupe concepts just about everywhere in the storage stack. It's kind of like security -- it belongs everywhere.
Now, I can't say specifically what capabilities you'll see where and when, but you're a smart guy, and I bet you can spot a few neat combinations that aren't that hard to do.
The bigger challenge is that any dedupe technology presents a different storage service level; specifically, writing into a dedupe space can be s-l-o-w if not handled appropriately.
Also, underlying data protection becomes an issue. Lose a block that's part of a dedupe set, and you'll find a neat hole in your data in all sorts of places.
I personally think the VMware spin-out is a unique situation. I'm no expert, but I don't think you'll see us (or anyone else) doing this sort of thing too often.
Look forward to more discussion on this -- and other -- topics!
Posted by: Chuck Hollis | March 01, 2007 at 10:28 AM
Hi Chuck,
Thanks for your insightful blog on how things related to storage networking, DR solutions etc. need to adapt for virtualized environments which are spawning quite faster world over.I kinda get a sense that it's VMware who's changing the dynamics in the storage market.From the shelf of this server virtualization leader, the industry is awaiting for thier new Site Recovery Manager software which automates and does a checklist of all the criterias before a site-site fail over happens and ensure that failures during this process is reduced or bought to nil. Is EMC planning something similar for non-virtualized as well as virtualized environments? Isn't a site-site failover automation software like VMware SRM , a "must-have" in the product portfolio of a company(EMC) which is well-known in the industry for thier array based replication products?
Posted by: Sundar | May 07, 2008 at 02:51 PM
Hi Sundar -- yes and yes.
We're investing heavily in upcoming support for VMware's SRM -- we see it as a really big deal for our customers.
But, at the same time, there's certain aspects that VMware brings to the table in DR scenarios which just can't easily be done in the physical world -- for example, creating a "virtual failover environment" where you can test failover logic in a containerized environment, without impacting production.
Simply put, I don't think we'll ever make physical failover as elegant as what's possible in the virtual world -- although we'll certainly try!
Posted by: Chuck Hollis | May 09, 2008 at 06:21 AM