Maybe I'm a bit slow, but when I see the same thing happen 5, 10 or maybe 20 times, sometimes a dim bulb goes on in my head, and I say "wait, maybe there's a pattern here!".
As you know, I spend a lot of time with customers. It's so much fun, I sometimes wonder why I'm allowed (and paid!) to do it.
And you know I've written about information infrastructure considerations for VMware ESX before.
So, I'm now starting to see the same thing over and over again.
And it's entirely avoidable.
Everybody Gets The VMware Thing
As far as server virtualization with VMware, just about everybody gets it.
Just like the stand-up comic who asks the audience deadpan "how many people are here tonight?", we can ask the same sort of question "are you doing anything with VMware?" and you'll get the same sort of answer.
But it's evolved a bit.
Everyone gets the cost-savings thing. Everyone gets the energy-saving thing. Everyone gets the ease of management and flexibility thing.
I even think there's a vendor backlash from customers along the lines of "yes, yes, we get it, so enough already!". But I think the problem has moved on from awareness and engagement to execution.
Stalled VMware Projects
I see the same pattern starting to show up.
The projects are identified. The ROI is establish -- it's the classic no-brainer. Funding is approved. And work begins -- and seems to stall somewhere along the line.
Six months ago customer was planning to do a large-scale VMware project. Today they're planning to do a large-scale VMware project. Will it be true in six months that they're planning to do a large-scale VMware project?
Yes, big IT projects take time. But, like I said before, once you see the same picture multiple times, you start seeing a pattern, one I'd like to share here.
So, What's Happening Here?
I think there's a fundamental mis-scoping problem that's going on in IT when they consider these large-scale VMware projects.
And, as a result, there's a few big implications about how to think about the problem.
I think that one of the fundamental problem is that these projects are being positioned as a server migration. Much like you'd move from 32 bit to 64 bit technology, or from racks to blades.
The server team is engaged, they dust off their Server Migration Handbook, and off they go.
Well, if you think conceptually about VMware server virtualization, that's only a piece of what's going on. I would offer that far more is on the table here than just a new flavor of server or operating system.
You're fundamentally changing how a big piece of IT operates.
And, because the problem isn't scoped at the proper scale, the assigned teams are sub-critical-mass, and the project tends to stall.
And this thought isn't theoretical -- I can name more than a few customers who would agree with this assessment.
Infrastructure Issues
This discussion breaks into two parts: people and technology.
The people issues are pretty easy to understand, once you think about it. IT typically has different specialization groups (server, storage, network, database, apps, etc.).
We often saw the server guys trying to get all the other groups to rally behind the server virtualization idea (after all, it was assigned to them), and -- well -- the inevitable happened.
The storage guys, and the network guys, and the database guys, and ..
And, as a result, I've seen more and more customers start to assign the server virtualization project to a cross-functional manager (or executive) who can engage and coordinate multiple disciplines, as opposed to letting the server guys try and convince everyone.
I know this must sound obvious to many of you, but you'd be surprised on just how many customers I meet where there's a VMware project going on, and they handed it to the server guys, and -- well -- it's stalling.
The technology issues are becoming pretty obvious to more and more people. About two years, we started to get exposed to infrastructure issues when considering VMware at scale.
And the pattern was pretty much the same. Here's how we did something in the physical world, and there's some considerable work to make sure that it works as expected in the virtual world.
Or, now that we think about it, given the fact we're in the virtual world, maybe there's a better way to approach the problem.
One classic example is backup. There was considerable effort on EMC's part to make sure our classic backup/recovery products (NetWorker) worked with VMs, supported VCB, et. al.
But at the same time, gee, all those VMs are really files sitting on disk. And there's a ton of duplicate information between them.
Maybe a file-oriented dedupe approach would be fundamentally better, given that we're living in a virtual world?
Hence Avamar.
Another classic example is remote replication. Again, there was considerable effort to make sure that classic replication products (SRDF, MirrorView, et. al.) all worked with virtualized servers.
But once we thought about it, it was clear we'd want per-VM replication and recovery (without turning off VMFS), yet use external network-based replication.
Hence RecoverPoint.
Same general discussion for storage resource management (SRM). And real-time discovery and correlation of IT resources (Smarts). Some of the Cisco guys I talk to tell me that they're discovering the same thing with IP network design.
And I'm sure there will be an information security discussion before too long.
Lnog story short: you can continue to solve infrastructure issues as you did in the past, and just layer on VMware (default approach), or you can start with "what would work best in a virtual world?" and take a bigger step.
And, not surprisingly, when we put this line of thinking in front of most customers, they tend to agree. Now, there's a whole trade-off discussion, as -- of course -- you're making the project bigger by layering more topics into it.
But -- at a minimum -- you should consider the potential and selectively choose what aspects you want to upgrade from a virtual point of view in the first take.
And, based on about 30-40 customer interactions, it seems to narrow down to three topics for most customers:
storage network design
yes, there are qual concerns as most people tend to probe advanced VMware functionality (e.g. DRS, VCB, et. al.). EMC had to do a ton of qual work here. And there's a FC vs. iSCSI vs. NAS debate that gets often revisited.
But there's another layer to the discussion.
We've seen more than a few customers end up with lots and lots of virtual machine images. Think of them as a library of server image containers with operating system, database, application code, data files, etc..
Some are actively running, some are under development, some are being kept around for historical purposes. And our old friend storage tiering (or ILM) becomes interesting because -- unlike the physical world, where you didn't (or couldn't) keep a library of well-packaged software appliances ready to run on a moment's notice, well -- now you can!
Customers tell me it's just so darn easy to gen up a slightly different server image for the task at hand that they're doing it more and more often.
And, if you're like most people, you'll end up with quite a collection, and not all of them need to be running on tier 1 storage.
Heck, I've even met customers who've started to store VM containers in a Centera for compliance and retention purposes, in addition to the fact that they're proliferating like crazy!
backup and recovery
I've written about this before, and now I'm going to do this again. For most people, they don't like their current backup environment, but it's the legacy investment that keeps them from getting serious from a rip-and-replace.
Well, here are the facts. You'll never have a better chance to revisit the topic then when you're considering a VMware ESX implementation. You'll want to exploit client-side dedupe for a whole bunch of reasons. And you'll want to store the backup images as mountable file systems (rather than locked away in tape format) for another whole bunch of reasons.
And you'll kick yourself if you don't consider both at the same time, because you'll never have a better opportunity than when the big VMware farm is going in.
And, as I understand it, there's only one dedupe product that exploits (instead of tolerates) VMware -- and that's Avamar.
storage resource management
I think you'll want to run reports on who's using what storage, won't you? Wouldn't it be nice to report that on a per-VM basis?
And I think you'll want to to end-to-end discovery -- here's the virtual machines, here's the ESX servers, here are their virtual and physical HBAs, here are the SAN ports, here are the storage arrays, etc. etc.
And, as I understand it, there's only one SRM product today that does this -- and that's ControlCenter
And Then There's The ITIL Run Book
Lots of discussion in the professional services realm on doing migrations from physical to virtual. EMC does it, lots of people are doing it.
That sort of work gets you there -- but doesn't tell you what to do when you arrive. And that's a new challenge I've been seeing more of.
Here's why
Most good shops have a run book on how to do normal day-to-day things in IT. Call it ITIL, call it common sense -- somewhere, there's a physical (or virtual!) notion of how process works in IT.
Not to scare you too much, but significant portions of that run book changes in a server virtualized environment.
Let's talk about a simple example: server provisioning.
Old school: request comes in, funding identified, lengthy discussions on requirements and config, hardware ordered, installed, configured, validated, and turned over to the requestor. Elapsed time: days to weeks to months.
New school: here's a virtual server, have fun. Elapsed time: minutes to hours.
Now, obviously there are a lot of steps in the old process that are no longer value-added. But there are some new steps in the process (e.g. aggregate farm capacity) that must be newly considered that weren't part of the process in the old school.
OK, how about problem resolution? Or patch management? Or storage provisioning? Or other aspects of day-to-day IT management?
Here's the bottom line: if you've got an ITIL run book, there are several old pages that have to change, and several new pages that have to be added. It's not hard, but it needs to be done.
And if you don't have a documented run book, well -- it's a bit harder -- as there's no centralized document to focus on -- you're facing lots and lots of meetings and training, and -- well -- bumps in the road.
I see this as another conceptual hurdle that -- with a bit of forethought -- is relatively easy to plan and execute. But you have to know it's coming, don't you?
The Bottom Line
I (as well as many others) see server virtualization -- as exemplified as VMware -- as transformational.
We're just not going back to the physical world, much in the way that the internet made point-to-point networks obsolete.
And I've met way too many people who see the potential, but have framed the problem as narrowly confined to the server guys.
And in the act of doing so, they're causing three problems:
- delayed projects (and associated delay of benefits),
- missed opportunities to fix related problems at the same time (infrastructure),
- and causing a bit of chaos when they realize their runbook was designed for the physical world, and not the virtual one.
Hope we can help!
Nice post! You have said it very well. Keep going.
Posted by: Kim | June 29, 2007 at 10:04 AM
Nice perspective. We (sales professionals in IT) sometimes get caught up trying to solve a 90-180 day problem for clients based upon quarter cycles and have difficulty organizing a long term strategy of products around VMWare. This is a great tie in that relates to the sales and technical folks steering customers initiatives in any IT org.
Thanks,
Michael
Posted by: Michael Hasbany | June 29, 2007 at 12:37 PM
Chuck, I ve seen customers take months evaluating VMWare, and only recently does it seem to be taken seriously.
I still see customers running "heavy" apps on dedicated hardware though and I hope that this will change as it reduces the impact of the virtualisation message.
Keep up the good work,
Nick
Posted by: Nick Jones | July 11, 2007 at 06:42 PM
Hi Nick
Very good point.
At one time, I believed that "heavy" applications, e.g. ones that consume a large server, weren't a target for server virtualization, mostly because the "consolidation rationale" wasn't there.
But since, I've come to believe that the management benefits associated with VMware can stand on their own, even for heavy apps.
Encapsulating an important app as a virtual machine means it can be moved from place to place at will, either dynamically (think DRS), or to-and-from a test/dev environment (think Lab Manager).
And, interestingly enough, I've started to meet my first few customers who feel the same way -- they're starting to encapsulate some of their heavier apps, and are justifying it simply on accelerated maintenance and management savings.
We'll see if we see more of this going forward, won't we?
Thanks for writing!
Posted by: Chuck Hollis | July 12, 2007 at 01:51 AM
And as Chuck said and I believe that we shouldn't undermine the cost savings! I did a project last year and proposed 20% savings. I showed it and "middle-managers" had issues with that. (Whatever it may have been) and the information/report was not pushed up to the senior management.
The "old school" ITIL and Change Management need to be revised, rewritten whatever. I'll call it BTIL (Business Technology), we need to address the needs of the clients. What I notice myself is that an optimally functioning Virtualized infrastructure brings "passionate and creative" people together. And they are everywhere. They are your clients, the server guys, everyone!
About heavy apps, I spoke to a shop (gov shop) last week and they had Oracle on VMware. Performance problems and vendor blamed it on VMware. VMware's expert came in and pinpointed it to the application. I personally believe that , if carefully tested, even Oracle RAC can run pretty well on VMware (ESX).
So what I notice is:
o Lack of an active community (Not anymore:Glad that VMware has started the initiative http://wwwa.vmware.com/www/wiki/)
o Commitment from top (CIO's should play a very active role)
o Lack of critical information*
* = We eventually started with the project but all the collected data was never considered, fortunately in this case I am heading the operation (and evangelizing it within the Server Division and outside to the clients as well!). They did it because I kept hammering.
Normally clients like these end up going over the project budget in all kinds of consultancy where they don't have consultants on "tap" but (sadly) on *top*.
Posted by: Tarry Singh | July 13, 2007 at 06:19 AM
Nice article here! I really have to agree with you when you talk about ITIL and how incorporating that into network automation means things go from hours to minutes. I know of a few companies who do it the “old school” way and they are always trying to improve application availability but with little success.
I see that you also make some very valid points on why people providing IT service, http://www.stratavia.com should take a look at and implement run book automation. Things change very quickly in a networks environment and to have human hands in the mix is really a waste of time these days. If a network is capable of patching itself through the use of network automation then people need to step back and let the system manage itself.
Posted by: Kyle O'Campo | October 07, 2007 at 04:24 PM
Thanks for a comment. Couldn't tell if you were sneaking a plug in there somewhere, but it was so subtle that I had to let it pass ... !
Posted by: Chuck Hollis | October 08, 2007 at 11:50 AM
Great article and very timely for me as we (re)embark on a fairly significant P2V VMWare project here and it looks like I might be that "cross-functional manager" for the deployment!
:|
Posted by: Robin Majumdar | November 29, 2007 at 01:59 PM
Awesome article and insight!
I guess I will be quoting from your blog in my presentation on VMware.
Posted by: UNIX Guy | March 29, 2011 at 12:07 PM