« Bringing Oracle To The Private Cloud | Main | VPLEX: The Birth Of A New Storage Platform »

April 29, 2010

Comments

John Dias

Chuck, when I saw this hit the fan I though "oh man, here we go" and in fact I sympathetic because... well, bad stuff happens to all of us from time to time. It's how you learn from it and react to it that defines your character.

I think your response to this is spot on and classy. Hopefully everyone in the industry will take this as a lesson learned and not an opportunity to trash a competitor.

Pat Adams

All valid points, Chuck. I'm one of your value added resellers. All I can say is that I'm not surprised by this. Whether intended or unintended, people take risks, sometimes too much risk.

It takes a village to get and keep storage right.

Service providers have a lot of pressure to drive costs (including storage costs) down but they need to know that they are taking risk and then manage it. It's not just EMC or the reseller that have responsibilities. Practice, indeed!

Chuck Hollis

John and Pat -- thanks for both of your comments -- they're appreciated!

-- chuck

Joe Svankanski

I was working as a contractor at a regional bank on a technology refresh. New production, dev&test and DR-site. When I suggested that we practice DR fail-over, etc for their ops staff, the manager was aghast, it would cost too much! Taking people away from their 'day jobs', the cost of the contractors' time, what if something breaks? Some time after cut-over, they did have 'an issue'. I had to fly in super urgently! Why? because the staff were 'scared' of activating the fail-over scripts. So very sad, and the vendor got blamed!!

Rick Parker

I agree with Chuck's point but even more than practice
I suggest designing for failover as a standard procedure.

Rather than a primary and a backup I suggest an "A" and "B" systems and switch between the 2 every 3 to 4
months. Just because you can failover for a few minutes
does not mean you can run for a day or more and if you
do have to failover I expect it will be for more than
a few minutes.

That being said all vendors need to improve their monitoring AND reporting tools.

Jeramiah Dooley

Good post Chuck. As a service provder this incident hit home, and you can bet I did some digging to get some more information on the backstory.

To your comment about "practice", one thing to remember is that (especially for something like SP load) keeping up to date with your FLARE code versions is a GREAT way to test that you have the capacity needed to be redundant. Since that SP failover is part of the the upgrade process, you kill two birds with one stone. It should be part of your normal management process on your array anyway, so it's a easy way to keep an indirect eye on your growth, even if you don't have any other monitoring set up, either through the array or through an additional monitoring tool.

Chuck Hollis

Jeremiah -- excellent advice to all -- thanks for sharing!

-- Chuck

Dominic Cody

Great post.

Currently where I work as an Architect we design failover and redundancy into every solution be it using EMC or HP Storage. This can be great but is a mass of overkill for somethings but the policy stands as that's the way they have always done it here.
But you would think for something as important as email that someone would have done this, especially in this age where email is seen as one of the top business critical applications. How times have changed from email just being a form of communication to now being something people can't live without.

Chuck Hollis

Hi Dominic

What struck me about this case was that redundancy and failover was ostensibly designed in -- but not continually tested. I suspect that creeping I/O growth moved their design from "green" to "yellow" to "red" over time -- and that's the lesson I took from this.

-- Chuck

Dominic Cody

Hi Chuck,

Yes true a very good lesson.
A lot of organisations (mine included) could take a lot from this in that just becasue something has redundancy and failover built in doesn't mean you can ignore it and assume all is ok with the world.

Dominic

Frank Finley

Chuck, I understand your point and you are absolutely correct - all technologies can fail and will. However the issue I take here is with EMC's practice of throwing mud on other vendors' technologies' behavior in failure situations. Hopefully that practice will stop.

Chuck Hollis

Frank

I agree with you, any sort of mud-throwing is frowned upon. But, generally speaking, that's not our practice here.

Can you offer up any examples?

Thanks!

-- Chuck

Frank Finley

Chuck sorry for my slow response - my work takes me on the road a lot and I can't keep up like I used to.

In your request for examples, I have to say I've seen the same from other sources as well. I'm glad to hear that it's not EMC policy to try to make the "other guy" look bad. Specifically I'm talking about the "Storage Anarchist" blog about IBM's XIV product. It was pretty nasty and my IBM rep tells me that it's not accurate. He does say that like every other technology out there there are instances where problems can occur but I've never heard him talk about EMC failures (though they surely exist) and instead brings the discussion to customer experiences of his products versus technical arguments about the other guy.

I have IBM and EMC in my shop and both are great products. I enjoy your blog when I get the time to follow it!

Chuck Hollis

Hi Frank -- thanks for making the time for a response.

Barry can be rather negative towards technology he feels is substandard for purpose. Although his accuracy level is extremely high, what is more subjective is whether or not his observations are relevant for your particular use case.

We both have issues where a product -- any product -- is oversold for a particular use case. Regarding XIV, I'm sure there are many situations where it will do fine, and many situations where it will not.

It's not a "bad" or "good" discussion, it's more of a "what is important to you discussion?"

But your comments are spot on, so thanks for sharing.

-- Chuck

The comments to this entry are closed.

Chuck Hollis


  • Chuck Hollis
    SVP, Oracle Converged Infrastructure Systems
    @chuckhollis

    Chuck now works for Oracle, and is now deeply embroiled in IT infrastructure.

    Previously, he was with VMware for 2 years, and EMC for 18 years before that, most of them great.

    He enjoys speaking to customer and industry audiences about a variety of technology topics, and -- of course -- enjoys blogging.

    Chuck lives in Vero Beach, FL with his wife and four dogs when he's not traveling. In his spare time, Chuck is working on his second career as an aging rock musician.

    Warning: do not ever buy him a drink when there is a piano nearby.

    Note: these are my personal views, and aren't reviewed or approved by my employer.
Enter your Email:
Preview | Powered by FeedBlitz

General Housekeeping

  • Frequency of Updates
    I try and write something new 1-2 times per week; less if I'm travelling, more if I'm in the office. Hopefully you'll find the frequency about right!
  • Comments and Feedback
    All courteous comments welcome. TypePad occasionally puts comments into the spam folder, but I'll fish them out. Thanks!