« EMC? Server Management? | Main | Chris Mellor Is A Smart Guy »

October 31, 2008


Scott Waterhouse

Let me add two more reasons:

1) Inertia. Never underestimate the unwillingness of backup admins to change. Backup applications are the stickiest application I have ever seen. People change mail systems more often. So it takes a disruptive event (like VMware) for it to be a real consideration.

2) We are the only ones really championing source side dedup. Symantec more or less gave up the ghost and went target side with PureDisk, and most other vendors just don't "get it"--they think source is the root of all evil (like DD) or worse, just generally pointless. So we need to do a lot of educating.

Steven Schwartz - The SAN Technologist

I think it is already there, we've just forgotten about it (source based deduplication that is). It is already happening at the application layer in many instances. It occurs with database based on linked records, Window's has it built-in on certain server platforms for the file & print services. Exchange has been using it for a few years.

However, Chuck, you hit it spot on. Source based deduplication typically limits the data set. There is serious talk about the possibilities of having global deduplication, however, I still feel that certain data sets should NOT be deduplication, and certain applications should start having this functionality built in, in a more intelligent manner then just realizing that there are duplicate block (block sets).

I still consider deduplication as a feature in it's infancy. I also have the best deduplication theory in development...break every data set into binary, then I can store everything as just a single 0 and single 1. (reading the data I'm still working on)

Sudhir Brahma

Is this a subject more in the realm of Applications and the way they choose to store data they generate, rather than being something to do with storage directly? There could be such utilities that implement source side dedupe and applications can subscribe to their APIs and use them when required, in a collaborative way. Doing source side dedupe across LAN and across storage boxes may be ok for some dedicated applications where the location of the base data is always known and well controlled and the application is aware of such dedupe implementations.
Dedupe across LAN/WAN inherently could add some degree of uncertainty: Data now is no longer autonomous and stored in one reliable location. Instead, a series of links to various such data blobs (stored across storage boxes across LANs) are also required to qualify the storage picture. These links are vital and also need to be stored just as reliably as the base data itself. This loss of autonomy and consequent increase in dependence on various other storage entities to construct the storage picture, could be a cause of concern for some applications.
Target side dedupe could be more assuring, since the storage box is “aware” of and actively involved in providing a consistent storage picture to a consumer/application. Also, for now, target side dedupe is restricted to a storage device and the scope is probably much less expansive than it is in a LAN.

David Vellante

As the saying goes, "Backup is one thing, recovery is everything." Users should make sure they understand their RPO and RTO requirements and ensure the dedupe solutions they choose are aligned with those critical factors. Thanks. - Dave from wikibon.org

W. Curtis Preston

I'm sure it's no surprise I've got something to say about this. First, I'll give you MY answer your and George's question.

I agree with Scott that the primary reason is inertia. It just takes a long time to turn the direction of the backup ship. (The aircraft carrier I served on had a turning radius of over a mile.)

My other reason is that source dedupe is primarily for remote sites, and most people are focused more on the central sites. They tend to ignore the remote sites, so it causes a greater level of inertia.

My third reason is the flip-side of the second reason. It's not JUST that source dedupe really helps back up remote sites. Source dedupe also doesn't play well in large data centers. The backup speeds (and more importantly) the restore speeds are simply not YET up to the speeds that today's large data centers need, leading them to target dedupe for the data center (where the bandwidth also happens to be less of an issue).

Finally, I do feel the need to correct some Scott said in his comment: "We are the only ones really championing source side dedup. Symantec more or less gave up the ghost and went target side with PureDisk."

That simply isn't true. First, there are several smaller vendors that are championing it. You are the only major disk vendor to do so, but that's no surprise since you own Avamar. As to Symantec giving up the ghost, that's complete nonsense. Their technology is equivalent in many respects to Avamar. They are a source dedupe product through and through with global dedupe, replication, etc. The "target side" bit is that they chose to prioritize expanding it's use as a target side dedupe for non-Puredisk backups over making it more seamlessly integrated with NetBackup. EMC, OTOH, chose to prioritize making the NetWorker/Avamar relationship more seamless before doing other things. They have not given up the ghost and I've actually personally witnessed some very large Puredisk deployments.

Chuck Hollis

Thanks, Curtis!

The comments to this entry are closed.

Chuck Hollis

  • Chuck Hollis
    SVP, Oracle Converged Infrastructure Systems

    Chuck now works for Oracle, and is now deeply embroiled in IT infrastructure.

    Previously, he was with VMware for 2 years, and EMC for 18 years before that, most of them great.

    He enjoys speaking to customer and industry audiences about a variety of technology topics, and -- of course -- enjoys blogging.

    Chuck lives in Vero Beach, FL with his wife and four dogs when he's not traveling. In his spare time, Chuck is working on his second career as an aging rock musician.

    Warning: do not ever buy him a drink when there is a piano nearby.

    Note: these are my personal views, and aren't reviewed or approved by my employer.
Enter your Email:
Preview | Powered by FeedBlitz

General Housekeeping

  • Frequency of Updates
    I try and write something new 1-2 times per week; less if I'm travelling, more if I'm in the office. Hopefully you'll find the frequency about right!
  • Comments and Feedback
    All courteous comments welcome. TypePad occasionally puts comments into the spam folder, but I'll fish them out. Thanks!