Don't know if you've ever taken a car trip through the Rocky Mountains (Colorado, Wyoming, et. al.) but if you stop in a friendly diner, you're likely to see a picture of a "jackalope" -- a monstrous cross between a jackrabbit and an antelope.
The locals will tell you with a straight face how the nearby ranchers prefer them to horses since they run much faster and eat less. Of course, no such thing exists, but it's good fun.
For some reason, there are those in the industry trying to make a connection between an extremely hot topic (server/desktop virtualization) and less-popular topics (storage virtualization, dedupe, thin provisioning, etc.).
Just like the infamous jackalope, the possibility is plausible to tourists, but somewhat of an inside joke to the locals.
The Connection?
No doubt, server/desktop virtualization is probably the single hottest topic in IT these days. More popular than grid, cloud, SOA, et. al. put together. And, without a doubt, the parade is being led by VMware -- they own the discussion, no matter how much others try.
And if you're an IT vendor, you're doing anything possible to link yourself to this white-hot trend.
You're spending time brainstorming with your peers trying to figure out how you're relevant to this trend, and how you can make it appealing to customers and investors.
Sometimes the connections are broad and relevant -- I'd argue strongly that EMC is in this category. You'd expect that, right?
Other times the connections are more tenuous.
Let's take storage virtualization as an example.
If you're a storage vendor, you talk a lot about storage virtualization, especially if you don't have much else to talk about. On one level, you have to, simply because your competitors are doing the same.
Well, there's a shared buzzword between the two concepts, isn't there? And, if you're not from around these parts, maybe you can be convinced that there's a relevant connection between the two, right?
Well, no, not really.
Servers and Storage Are Different
Servers do computing work, running multiple tasks.
If one task isn't fully using server resources, there's ample opportunity for another task to come in and use the same resource. Multitasking (the distant ancestor of server virtualization) has been around since the 1960s. And the idea lies at the root of the primary economic benefit of VMware, e.g. using less server resources to get the same work done.
At one level, an application wrapped in a VMware container really doesn't care much where it runs -- any server will do. That leads to additional benefits, like flexibility, easier management, load balancing, etc.
Storage and information have a fundamentally different sharing property.
Information has to stick around between uses. Hence the need for storage capacity.
Not to be obvious, but if one application decided to free up needed storage capacity by deleting another application's storage, this would not be OK with the other application owner.
As an example, my kids recently took it upon themselves to delete my wife's favorite programs off the DVR to make room for their cartoons. Their logic? "Well, Mom wasn't using it".
That argument works when you're talking about the TV. But it doesn't work when you're talking about the DVR.
Now, I Understand How It Can Get Confusing
Do both servers and storage gain consolidation benefits from virtualization?
Well, technically yes, but the effects are much more pronounced with server virtualization. By comparison, simple good housekeeping can yield the same consolidation benefits from storage, usually without the need for "virtualization technology".
Does virtualization make it easer to move things around for both servers and storage?
Well, technically yes again, but the benefits are dramatically different. In a large virtualization server pool, the ability to dynamically balance workloads (think VMware Vmotion, DRS, etc.) is a huge win, simply because there's such a dynamic range in application workloads -- pooling and load-balancing leads to order-of-magnitude benefits, which is why it's such a hot topic.
There's far less dynamic range in storage environments, there's a much more infrequent need to move things around (as compared to application workloads), the economic benefits are less pronounced, and so on.
Simply put, the two technologies can make roughly similar claims, but I think the magnitude of the benefits are vastly different. And few people dig down deep enough to understand the differences.
And It Can Get Even More Confusing ...
With ESX 3.5, Vmware announced a cool feature -- Storage Vmotion -- which, technically, is a flavor of storage virtualization that runs in the server. Gee, if VMware is doing storage virtualization, it must be cool, right? And all the companies doing storage virtualization must be pretty cool, right?
I know, it sounds a little silly, but I've heard this on more than one occasion.
In larger organizations, server virtualization is driven by one particular group with a very specific set of objectives, and -- of course -- everyone is doing it. Storage virtualization (if it's done at all) is driven by another group with a very different set of objectives.
From my experience, there's almost no overlap. Perhaps they share a buzzword, but nothing more.
But There's More Craziness Afoot ...
One example that's bothering me is that, recently, NetApp has made much of the fact that their file-oriented data deduplication capability (A-SIS) can reduce storage capacity significantly in VMware environments.
Yes, I'd agree that's technically true. VMware images (.vmdk files) tend to have a lot of redundant data in them, particulary the binaries. And they're technically amenable to data deduplication approaches, as compared to information that's already compressed, like zip files and JPEGs. No argument there.
But they're not telling you a few things -- and I think they should.
First, there's absolutely no discussion about the performance implications of data deduplication on production data, and no amount of hand-waving can make this one go away. All forms of data dedupe can be thought of as a different service level than physical storage -- plain and simple. With today's technology, there is no free lunch -- unless your vendor takes you out.
Now, take it from the point of view of the server guy who's trying to get his organization comfortable with using more VMware. The one thing this person doesn't want is things running noticeably slower on VMware, period. Most server pros won't want to take this chance, at least not anytime soon.
Save a bit on production storage? Always a nice thing, yes. Save a boatload on running my server farm? Absolutely compelling. And most people won't knowlingly risk the second to get the first.
And, if you think about it a moment, this is a bit of a red-herring. If you had 100 server images worth of data before virtualization, you'd also have about 100 server images after virtualization. Mostly the same kind of data to store, right?
Did the presence of VMware really change anything significant?
Nope.
I Think There's A Better Target For Storage Savings With VMware
Are there parts of a broad VMware landscape where data dedupe makes more sense?
Yes, absolutely. Think backup.
For a given production environment, we usually find there's between 3.6x and 20x more capacity in the "backup shadow". Want to save real money on VMware storage, and not worry about performance? I'd strongly suggest you'd look there.
I don't have hard numbers, but anecdotally, proficient VMware users tell me that they have a whole lot more virtual machine images sloshing around than they used to. They tend to keep old ones around, just in case. Lots of them.
Another reason to think "backup", since this characteristic is turning out to be materially different in the virtualized environment than the physical one.
When we acquired Avamar a while back, they'd optimized their architecture for the VMware environment.
Their clients ran nicely in a virtual machine.
They implented a global view of ALL data, and could squeeze far more out of the environment, rather than just looking at isolated chunks as with other approaches.
They did their dedupe on the server, not the storage, which meant that backups happened far faster and with less expensive plumbing.
They stored their backup images in native format, meaning it was easy to simply mount an old version of a VMware image from "backup" and go.
And recently, they've made their back-end run neatly in a virtual machine, meaning that it can run as a virtualized task in your server farm, and not require dedicated hardware, if you so choose. Sweet!
And the overall storage savings are absolutely astounding as compared to snaps, incrementals, etc. -- all the stuff we're used to. Most people take an existing VMware server image, make a few changes, put the new one into production, and keep the old one(s) around.
That's a use case that's extremely amenable to client-side data deduplication, isn't it?
But, let's be honest, there are a few use cases showing up in VMware where people are doing heavy-duty transaction or bandwidth stuff. The data isn't particularly amenable to dedupe. There's no time to do data deduplication -- you need blazing performance to move data off, and back on again in a hurry if you need it.
So EMC went a step further, and added Avamar dedupe technology to our NetWorker client. Now, we can offer a single client for a VMware environment that gives you some interesting service level choices: traditional, snaps, replication, dedupe, backup to disk, et. al.
As your needs change, the backup client (and its management) doesn't. Dedupe work for you? Great. Need something faster? Great. Need something even faster? Fine. One way to provide multiple service levels, including our popular friend, dedupe.
OK, it's not a single-headline story, I'll grant you that -- it takes a while to explain what's going on, and why it's important.
But, from my albeit slanted view, it's a far more compelling storyline than just a single feature applied to VMware with somewhat dubious customer benefits.
But Really, I Do Understand
VMware is white hot -- although if you look carefully, you'll see NetApp, Dell, HP, Sun, IBM et. al. hedging their bets with Xen, Virtual Iron, Hyper-V, et. al.).
If you're a smaller company (or even a bigger one) you want to hitch your wagon to this rocket ship. And, in a noisy environment, it's tempting to claim a single feature or buzzword to make what you've got stand out from the crowd.
But does this sort of behavior really create value for customers?
Looks like a jackalope to me ...
He Chuck,
Fell over laughing at the double entendre!
"there is no free lunch -- unless your vendor takes you out"
... vendor takes you out to lunch => free lunch for you.
... vendor takes you out (period) => vendor eats your lunch for free !
Enjoy your columns, as always.
Cheers,
Fred
Posted by: Fred San | December 19, 2007 at 05:31 PM