« I Can See Clearly Now ... | Main | Enterprise Flash Drives -- An Update »

July 30, 2008


Joerg Hallbauer


I know how you feel. I've been in this business for about 30 years, and you know what I've noticed? We keep solving the same problem over and over again in much the same way. We often just change the name, and sure, some of the details change.

Sometimes technology has advanced enough to make something that wasn't a very good solution (like in your example) viable. But in general, a lot of what we are doing really is the same thing, over and over.

Take clustering as an example. DEC had a very nice, elegant solution to the problem in VMS. They knew that the problem really couldn't be solved by layering something on top of the OS, that you really needed to make it part of the OS design in order for it to be truly clean. But look at Open Systems clustering. Yup, it's a layer on top of the OS, which is nice, but doesn’t allow for the kind of seamless active/active clustering that VMS did.

That's just one example; HSM (now called ILM) was around on the mainframe for a long time. Why? Simple, they were trying to solve the same problem we are. How do you store data on the "right" kind of media based on the value of that data to the organization?

I have to wonder why we do this. Is it a lack of real new ideas being developed? I mean, if you think about it, the disk drive was pretty much invented in the 50's, isn't it time we came up with something new to replace it?

As I see it, disk drives will go the way of tape, they will move from being primary storage to being used for backup/archive just like tape moved from being primary storage to backup/archive. Tape will move from backup/archive to deep archive, and maybe eventually actually disappear.

But the question is what will replace disk drive? I know that EMC has SSD, which I think will probably be the "next big thing", but I think that it will be an interim technology until "the next big thing" comes along that will revolutionize storage completely. Can you say "nano technolgy"?


Chuck Hollis

Yep. And if there's an IBM guy in the room, we'll always hear the "well, we invented it first" discussion ...

Couldn't agree more on the VAXcluster discussion (I was a fan as well), and the HSM discussion.

A minor point -- in the late 1980s, I worked for a company (Convergent Technologies) that build modular, networked workstations (NGens) and had an operating system (CTOS) and apps like you dream about -- all before PCs, ethernet, Microsoft, et. al.

It was really great technology with really great ideas. And, like many other companies before its time, it's gone without any trace whatsoever. Sobering experience.

There are a few real big ideas floating around out there that I'm enamored with -- several appear in EMC's portfolio.
The challenge will be to turn them into sustainable, profitable businesses -- and stick around long enough to enjoy the rewards.

Thanks for writing!



I first want to say that a newsletter about "cloud computing" from the Motley Fool, Google, this link, and that link brought my here to your blog. I can't say that I am a professional IT, I can say that I am a professional end user. The Company that I work for is trying hard to move to the "one place storage" actually the three place storage. Each place would contain the exact same data and is backing each other up. I thought of this story when you talk about duplication. In this instance there is most likely "400000ication." Where I used to work users would download large files to their desktops to make sure they had access at all times. These files are PDF so they are not manipulated but still space is required on both ends. Just about everyone does it. The reason that we do this is because we hate to wait. If the data you need is off of the fiber network, forget about it anytime soon. The after lunch crowd on the internet bogs IE down to a snail. I do understand the concept and can give credit where credit is due. I just think this is like the PDA, It will be great, but it needs more. PDAs now must have cell, and in the future they must have GPS. In addition, they will need access to the sources. This costs money and it seems inefficient, but so did allot of wonderful products. Thanks for what you do!

Alex McDonald

Strike "deduplication" and replace with "cloud" in your piece, and you've my wholehearted agreement.

The rest of us who have deduplication solutions (that even work on EMC stuff http://blogs.netapp.com/exposed/deduplication/index.html) are fixing real business and IT issues, generatng practical solutions, with customers eager to get on with the job. These are customers that you should meet and talk with.

You won't need to travel by helium balloons tied to a pundit's deckchair to visit them either.

Chuck Hollis

Hi -- thoughtful, balanced and mature comments as always.

Always a pleasure conversing and collaborating with fellow professionals in our industry.

Please don't change!

Alex McDonald

Dedupe technology has moved on since Doubletake. 25 years of moved on. You're right, it's a flawed comparison.

So what about dedupe today? Yes, not all data dedupes equally well. Yes, not all data is worth deduping. Yes, it will be replaced by something else in the long term.

Those aren't insights, they're stating the obvious.

There are huge swathes of stuff that does dedupe, needs dedupe, and where the space recovered by dedupe is of value. Dedupe is one of those useful, here-and-now technologies.

That's obvious too. But, because EMC doesn't have it, it's an "infatuation curve"? Pffft!

If you think me thoughtless, unbalanced or immature for pointing out the vacuousness of your argument, so be it.

Chuck Hollis

Ummm .. let's take a quick tour of dedupe at EMC:

- client-side dedupe for backup (laptops, servers and file systems) with Avamar and Mozy, Avamar fully integrated with VMware, and the broader NetWorker framework -- both doing extremely well in the market, IMHO

- recently announced versions of dedupe for our successful disk libraries, with flexible options between in-line, post-processing, all built on CX technology, so has spin-down, etc. with good early adoption. I heard you're going to offer a specialized (e.g. non-OnTap) device in this category soon -- best of luck!

- and, of course, the platform that started it all with single-instancing, Centera

Do we have dedupe for primary tier 2 storage (e.g. NAS with very low service levels) -- yes, we do that with a combination of file virtualization and a variety of targets -- but I'm sure there's more coming -- there always is.

Glad you're aware of your tone in these discussions.

W. Curtis Preston

I agree that Alex was wrong for saying that EMC doesn't have dedupe. But I have to take issue with two claims in YOUR comment.

First, how did Centera "start it all?" Three years before EMC announced Centera (which you did in April 2002), there was a little company called Undoo. They announced their first product in February of 2002, and started shipping it around the same time you shipped Centera. The company would later change it's name to Avamar and be acquired by EMC.

In addition, the original RockSoft dedupe patent (the one all the lawsuits are about) was also filed in 1999: http://www.patentstorm.us/patents/5990810.html. While they took longer to come to market, I think a patent filed three years prior to your product's announcement shows they were there before you. (Of course, you're also using that technology today in the Quantum software you're using in the DL3D line.)

Finally, how can you claim to have dedupe on primary storage? So you're saying I can store a bunch of VMware images on a Celerra filer or Clarion array and it will find the common blocks between all those images and eliminate them? Can you please point to the Celerra or Clarion datasheet(s) that explain(s) that feature?

Chuck Hollis

I have been surgically and professionally corrected!

Thanks for that illuminating bit of background -- much appreciated!

Chuck Hollis

Sorry, forgot the second part of your question.

And I may have gotten ahead of myself a bit here.

Conceptually, it's not a part of either Celerra, or CLARiiON, it's part of Rainfinity.

The idea is to virtualize files by moving them to appropriate service levels based on flexible policies, including a service level that (ahem, theoretically speaking) could be deduped.

As a result, there's primary storage data reduction solution for NAS that works with, well, just about anybody's NAS environment, right?


W. Curtis Preston


You're a smart guy, and I believe the "ahem" part of your answer alludes to the fact that you know you're stretching about as far as you can here.

The Rainfinity file management appliance is a Hierarchical Storage Management (HSM) or (if you prefer) an ILM appliance. It will "move inactive unstructured data on a per-file basis to lower tiers of storage or to EMC Centera."

That's definitely not dedupe. That isn't even data reduction unless one of the tiers is Centera and there are files that are 100% duplicates of each other (which should be very few).

When the industry is talking dedupe of primary storage, we mean it in the strictest sense. Leaving your ACTIVE, HEAVILY USED data on a device that will examine and identify and remove the redundant blocks in that data. This is at a sub-file-level, which will not only find redundant files; it will also find redundant blocks between files.

Consider multiple VMware images stored on the same system, each with a copy of MS Office. Did you know that when you activate your copy of MS Office and put your initials in it, MS actually stores those initials in some of the executables, slightly altering them? A good primary storage dedupe system would find all the copies of WORD.EXE, identify the 99% of the blocks that are the same between all the VMware images, store them once, then also store the 1% of blocks that are unique for each file. And a GOOD one will do that WITHOUT changing the performance of the storage device. I'll leave it as an exercise to the reader whether your competitor meets all of these requirements, but the system you have described definitely does not.

It's not really dedupe unless you're doing it at a sub-file level. Even if you would use Rainfinity to "move inactive unstructured data on a per-file basis" to a NAS mount point presented by a DL3D 1500/3000 (EMC dedupe device), that wouldn't be dedupe of primary data. It would dedupe of inactive data.

Chuck Hollis

All valid points. Very valid points, I'd offer.

And I've said about as much as I can ... ;-)

The comments to this entry are closed.

Chuck Hollis

  • Chuck Hollis
    SVP, Oracle Converged Infrastructure Systems

    Chuck now works for Oracle, and is now deeply embroiled in IT infrastructure.

    Previously, he was with VMware for 2 years, and EMC for 18 years before that, most of them great.

    He enjoys speaking to customer and industry audiences about a variety of technology topics, and -- of course -- enjoys blogging.

    Chuck lives in Vero Beach, FL with his wife and four dogs when he's not traveling. In his spare time, Chuck is working on his second career as an aging rock musician.

    Warning: do not ever buy him a drink when there is a piano nearby.

    Note: these are my personal views, and aren't reviewed or approved by my employer.
Enter your Email:
Preview | Powered by FeedBlitz

General Housekeeping

  • Frequency of Updates
    I try and write something new 1-2 times per week; less if I'm travelling, more if I'm in the office. Hopefully you'll find the frequency about right!
  • Comments and Feedback
    All courteous comments welcome. TypePad occasionally puts comments into the spam folder, but I'll fish them out. Thanks!