« Your Useable Capacity May Vary ... | Main | A Final Update -- Storage Capacity Efficiency »

August 30, 2008

Comments

Martin G

It's a good hornet's nest to stir up to be honest! A bugbear of mine for a while has been trying to get true costs from vendors and yes I know it depends but the question I am asking more and more is assuming I follow your best practise, how much will it cost me to store a true terabyte of data? We are beginning to get more sophisticated as well and beginning to profile our applications; so I might ask how much will it cost me to store a true terabyte of data with a certain I/O profile.

And BTW, if I follow your best practise, I want you to sign-up to guaranteeing performance and capacity. I might not be looking for a money-back guarantee but I am going to be looking at you to put some skin in the game; be it service credits, PS, free upgrades etc.

Of course, you might get very conservative and completely over-specify but hey, this is a competitive game and if you price too high, you loose but if you under-specify to win the business and put yourself in a hole; you better be prepared to give me a ladder to get us out of the hole.

I find Chuck's blog (and Barry's) very useful as it tends to generate great debate and this is all useful to me, the end-user. It's the nearest thing I get to sticking you guys all in a room and telling you to fight it out. So all, please keep posting (try to keep it polite tho')!

Ahmad

it is absolutely healthy to think of usable capacity, performance, availability,etc.. It is all about SLA, isn't?

The Big problem is that, unless you have a very clear application profiling, this does not help and for that matter, it is hard to find enough expertise from the customer and vendor side to do this excersize (virtually all vendors are willing to do it for BIG customers but not many will do it from small shops)

Many vendors like to show you two pages of features without focusing in your problem and things don't go as it should..

Ironically, many customers enjoy the debates

This debate can be a good start as MS exchnage has gone through a good profiling pushed by Microsoft ESRP and vendors BPs... it is difficult to extend it though beyond this....

Val Bercovici

Thanks for netting this out Chuck. The signal to noise ratio in your prior post plus comments was getting too low due to the volume of your hyperbole as well as the buzz of all those hornets flying around :)

On behalf of NetApp, let me unequivocally state in no uncertain terms that the *officially recommended* SAFE percentage of Space and Fractional Reservations for LUN’s on NetApp FAS systems is ZERO.

That means in your prior example NetApp FAS efficiency goes up to 34+37=71% and we win your little circus sideshow. Pretty much as simple as you would expect NetApp to be!

FWIW - Martin’s comment above actually nails it as the true underlying customer requirement at this level of the IT stack. What can we as storage vendors offer that properly balances all four of:

1. Availability,
2. Performance,
3. Efficiency, and
4. Overall Cost?

NetApp’s focus and proven track record is on all 4, not merely one or two of the above. And I haven't even mentioned dedupe on primary storage yet :)

For those interested in finding out the difference between NetApp’s recommended and default settings for managing snapshot space (plus maybe a little history explaining some folks’ confusion on the topic) I will commit to posting an entry by the end of this long-weekend over on my blog:

http://blogs.netapp.com/exposed/

After all - I wouldn’t want to be responsible for ruining everyone’s final weekend of the summer by talking shop now, would I?

Val Bercovici
CTO-At-Large
NetApp

Chuck Hollis

Hi Val:

Forgive my skepticism on your claim of "zero" reserve for snapshots.

Is there a document you could point to that recommends zero reserve for block-oriented Exchange environments on FAS?

If there is, please send it along, and I'll be glad to change things.

We're just going by published docs. Not "CTO-At-Large" comments.

Many thanks.

Val Bercovici

Hi Chuck,

I would be happy to point you to a popular document many of our customers use for sizing Exchange 2007 environments, but first let's try a simple credibility test.

Can you point me to a single public document from EMC which recommends RAID5 (8+1) for Exchange 2007 as you imply in your series of posts the past few days here?

Mike Shea

I'll ask again - my first query went unposted. What has happened to you?

You used to take at face value the comments from those 'in the know'. Val now points out a conspicuous fact, and you fight about it.

You are losing any cred you had.

I have worked for EMC and NetApp. I know what I am talking about on both accounts, and you are very wrong here.

Chuck Hollis

Mike Shea -- I've been approving *every* comment that's showed up on this thread, including your first one. I can't explain why it didn't make it through. Please don't read too much into it.

Chuck Hollis

Hi Val -- fair request.

I'm taking the long weekend off, but will have someone respond first thing next week. I'll email you the link as well.

Cheers!

Geert

Taking the Dodge on your long weekend trip, Chuck?

MarinaG

any time you acknowledge a vendor's comments, you validate them.

Richard

Chuck,
You said "Sure, there's overhead for RAID 6 DP versus RAID 5 (an academic argument to be sure)"....it is not academic if CX4 RAID 6 causes stated 60% performance penalty....pls explain.

Martin G

While we are all here talking about Useable Capacity and there appears to be a number of vendors reading this; can you address one of my personal bugbears, the definition of a Terabyte? We are intelligent people working in the IT industry and should be capable of working in bases other than decimal.

Why is this important? It is very important for me and a number of other people, especially anyone trying to operate a recharge model!

Chris M Evans

Chuck

Let me be the first to volunteer as the "independent party" to do your comparisons.

If the vendors in question are happy to provide access to hardware, I'll even do some performance comparisons....

KPC

Hi Chuck,
I have seen overhead exists everywhere.

What percentage vary from Vendor to Vendor. Applications or features turned ON changes the equation.

Say on EMC Timefinder the overhead is 100% and EMC Snapshots the overhead is say 30-50%

On NetApp
The Snapreserve can vary between 20% to 200% depends on what feature the user needs.

On HP or IBM the similar overhead exists.

It exists everywhere and I think every customer understands this. There is no right or wrong thing in the overheads taken. Every customer will appreciate if the overheads can bring them better results.

Say for example instantaneous snapshot backups and restore. Take 300% overhead and if you can restore a 5 TB database in seconds.

Great! Go for the overhead and give me better results.

In a direct apple to apple comparison a Clariion will have less overheads with other vendors. But certain features turned ON like Snapclone, Snapshots the story needs a relook.

Appreciate if you can help post this on the blog :)

Kostadis Roussos

One of the dangers of comparing storage features and usable capacity is that not every storage feature, for example, snapshots are the same.

NetApp snapshots because of their implementation are used in fundamentally different ways.

A far more interesting question, one that you did ask, is why did the CLARiiON never have 100% reserve?

For that answer, you can check out my blog here

http://blogs.netapp.com/extensible_netapp/2008/09/why-the-clariio.html

Basically it boils down to the fact that NetApp and CLARiiON snapshots are used differently.

Paul P

Chuck,

Always entertaining - One of my favorites blogs that I check almost daily.

Not really sure how seriously I should take your arguments regarding capacity when we have some basic concepts that seem (to me) out of alignment.

Let's start with your RAID 5 and dual parity RAID comment.

Your previous thread, you disagreed with dual parity RAID being any different to RAID 5 with a global hot spare. Maybe I've missed something. To me, this would be like me arguing that RAID 5 is really no different to RAID 0 with a (global) hot spare. Maybe I misunderstood you? Additionally in context of this discussion, can you then explain how EMC's RAID 6 differs from EMC's RAID 5 (with a global hot spare)?

Lastly, I've not come across any one, single storage engineer that would default to RAID 5 for MS Exchange and the like. Best practice is RAID 10, is it not?

Microsoft has this to say:
http://technet.microsoft.com/en-us/library/bb738146(EXCHG.80).aspx

Rob

"HP stated that -- for our exercise -- 1 or 2 disk groups should have been used, and not 7. If they are correct as stated, our results are wrong, and we need to go recalc a bit."

The HP response to your post points to a best practice guide. If you look in the guide it states as few as possible. The HP response points to 2 groups. Alternating transaction logs and dbs.

"The concept is sometimes called "performance isolation", e.g. minimizing contention between demanding applications. On a traditional (e.g. non-virtualized) array, this is pretty easy to do: simply carve up LUN groups and hand them out for different purporses. No one will step on anyone else's spindles."

Close to 60 disks per group if you are careful to alternate between the two (not much management there), IO balance should be good to go and a lot of underlying IO and bandwidth thruput.

Chuck Hollis

Sorry, everyone, I hearby declare blog comment bankruptcy -- I can't keep up with relying to all the comments!

I will try and summarize some of the majore threads later, though ...

W. Curtis Preston

I'd have to agree that your comparison is not taking into account the typical use of snapshots on a NetApp vs the typical use of snapshots on a CX array.

The reason that most (if not all) documentation that you can find recommends a large snapshot reserve is that most NetApp customers use snapshots all the time. They make them hourly, daily, and weekly, and keep weeks or even MONTHs of history online. Those snapshots are then integrated into their data protection mechanisms using things like SnapManager for Exchange. And they get all those snapshots with the same performance as they get with no snapshots -- something you definitely can't say about CX arrays.

Now let talk about Clarions. If people make snapshots on a CX, it's typically to create a quick, fixed reference to back up for tonight only. That snapshot is then released the next day before the next snapshot is made, or MAYBE kept around for a few days. They're not typically kept around for days, weeks, or months the way you do on NetApps.

You don't need much reserve to do that. This is why EMC's recommendation for snap reserve is much lower than NetApp's recommendation. People don't keep as much snapshot history, so they don't need as much snapshot reserve.

If you did the same number of snapshots on both systems, I think the amount of space taken up by snapshots would be roughly the same. (They're both block-level-incremental technologies.) The only reason they recommend a larger reserve is that people make more snapshots and keep them longer on NetApps -- because they CAN. And they can do so without a performance penalty. (Try keeping 30 or 60 days of snapshot history on a CX and see what happens to the performance. That's why people don't use ANY copy-on-write-based snapshot systems the way they use NetApp snapshots.)

If you were to use a NetApp like a Clarion (and only keep a few days' worth of snapshots), then the snapshot reserve would be roughly the same, and the whole point of your post would be moot.

And, BTW, I'd have to agree that comparing RAID 6 with RAID 5 and hot spares is just funny.

Chuck Hollis

Hi Curtis

I'd expect a different point of view from you.

First, the primary reason such large snap reserves are *mandated* by NetApp in Exchange environments has little to do with customer preferences about how long snaps might be retained.

It has more to do with the fact that with NetApp's snap implementation, running out of snap reserve means the application (e.g. Exchange) fails on a write command, which is not a pretty picture.

Hence the recommendation for 100% snap reserves, coupled with strong warning language, in NetApp's documentation for Exchange environments.

The comparison made here has little to do with "typical use case", as mentioned before it's based on each vendor's mandatory recommendations for an Exchange environment.

Go re-read the material I posted more carefully, please.

On a related note, do you consider keeping snaps around on the same physical array a "backup" per se? I mean, if the array has a bad day, you've lost your primary AND all of your "backups" in one fell swoop -- what are your thoughts on this?

Thanks!

Val Bercovici

Hi Chuck,

Seems to me like Stephen Foskett's (comparing apples and oranges) and Curtis Preston's (actually understanding your apples and oranges!) objective opinions are perfect bookends to this vibrant discussion.

FWIW - Even though you and the rest of EMC seem to routinely ignore it, I've gone ahead and described (again) how customers today safely provision far less than 100% LUN reservations using Exchange and other block-oriented apps using NetApp FAS arrays:

http://blogs.netapp.com/exposed/2008/09/emcs-storage-ca.html
(stay tuned for screen shots in another post tonight :)

Hint - you can automatically extend underlying volume sizes and/or proactively delete snapshots as you reach your snap reserve limit. All via simple system-wide, host-specific and/or enterprise-wide policies.

Chuck Hollis

Thanks for that, Val, as always.

But I still haven't gotten straight answers from you (or the rest of the NetApp blogging gang) on the following:

1 -- NetApp published documentation for Exchange clearly states a 100% reserve recommendation. Is the document wrong?

2 -- FAS defaults to 100% snap reserve in these environments, which is what customers usually end up running, right? Why is this?

3 -- Still left unanswered by NetApp is "what happens when I run out of snap reserve?" even after all automagic is applied. Provisional answer: your application crashes, period.

Sure, customers can change things (at their own risk, of course) but your protestations don't hold up until we understand these three facts.

Direct answers are appreciated, as always.

Stuart Harrison

Can I just ask - if a LUN on a CLARiiON had SnapShots enabled, and 100% of the data within that LUN changed, how much space would be taken in the SavVol? 10%, 20%?

Val Bercovici

Hi Chuck,

Even though you have no direct answer to why your CX4 configuration example violates all of EMC’s stated best practices and published benchmarks, I’m happy to answer your direct questions above:

1. The one NetApp Exchange whitepaper you refer to does indeed contain outdated information, which takes the most conservative approach in that one particular section. That section of the paper was authored before SnapManager for Exchange 5.0 was released. However since the release of SME50, the Admin Guide for that product contains our latest recommendations which enable NetApp Exchange customers to get as aggressive as they want regarding space efficiency for LUN’s and/or snapshots – all at no risk to the application.

2. Because we can, and customers see enormous value in it :) See Curtis Preston’s salient comment right above for further explanation!

3. You’re dead wrong in your interpretation here. At the capacity limit, pre-configured automatic policies will either grow the underlying volume and/or delete snapshots to ALWAYS enable writes to the LUN. Period.

Our customers are welcome to change these default or recommended settings because they have the rich monitoring tools and automated intelligent policies to SAFELY maximize all of their important criteria at the same time:

1. Availability,
2. Performance,
3. Efficiency, and
4. Overall Cost?

Once again I will unequivocally state that NetApp’s focus and proven track record is on all 4, not merely one or two of the above. And I haven't even mentioned dedupe on primary storage yet :)

W. Curtis Preston

Sorry to disappoint you, Chuck, but I am firmly in NetApp's camp on this issue. I believe you are trying to make a point that doesn't exist. I'm not saying I like NetApp more than EMC or vice versa. I'm saying that the point you are trying to make doesn't exist.

1. You don't need a snap reserve if you're not going to make snapshots

2. The reason they recommend 100% is that this matches how people use snapshots on NetApps.

3. If they made snapshots on NetApps the way you make them on CX, you would need a snap reserve the same size as the CX array.

As to your "why don't I see this in the docs," it's not because it doesn't work or shouldn't be in the docs. It's because nobody at NetApp said, "Hey, do you suppose that we should document how NOT to use our single most differentiating feature?" (That being all the snapshots you have space for with no performance hit.) NOW, if you ask anyone who actually KNOWS something about NetApps (support, their CTO, me, anyone else who's actually USED a NetApp filer -- NOT a blogger for a competing vendor, no matter how forthright he may be), they will tell you that if you don't make snapshots, you don't need any room for them. Why is this so hard to understand?

As to your other question, of course I don't consider un-backed-up snapshots a backup per-se. They're a backup against logical corruption, of course. But to recover against physical problems (double-disk failure in a RAID 5 array), you need to copy it somewhere else via replication (snapmirror, snapvault, qtree snapmirror) or via tape backups.

W. Curtis Preston

Sorry to disappoint you, Chuck, but I am firmly in NetApp's camp on this issue. I believe you are trying to make a point that doesn't exist. I'm not saying I like NetApp more than EMC or vice versa. I'm saying that the point you are trying to make doesn't exist.

1. You don't need a snap reserve if you're not going to make snapshots

2. The reason they recommend 100% is that this matches how people use snapshots on NetApps.

3. If they made snapshots on NetApps the way you make them on CX, you would need a snap reserve the same size as the CX array.

As to your "why don't I see this in the docs," it's not because it doesn't work or shouldn't be in the docs. It's because nobody at NetApp said, "Hey, do you suppose that we should document how NOT to use our single most differentiating feature?" (That being all the snapshots you have space for with no performance hit.) NOW, if you ask anyone who actually KNOWS something about NetApps (support, their CTO, me, anyone else who's actually USED a NetApp filer -- NOT a blogger for a competing vendor, no matter how forthright he may be), they will tell you that if you don't make snapshots, you don't need any room for them. Why is this so hard to understand?

As to your other question, of course I don't consider un-backed-up snapshots a backup per-se. They're a backup against logical corruption, of course. But to recover against physical problems (double-disk failure in a RAID 5 array), you need to copy it somewhere else via replication (snapmirror, snapvault, qtree snapmirror) or via tape backups.

The comments to this entry are closed.

Chuck Hollis


  • Chuck Hollis
    SVP, Oracle Converged Infrastructure Systems
    @chuckhollis

    Chuck now works for Oracle, and is now deeply embroiled in IT infrastructure.

    Previously, he was with VMware for 2 years, and EMC for 18 years before that, most of them great.

    He enjoys speaking to customer and industry audiences about a variety of technology topics, and -- of course -- enjoys blogging.

    Chuck lives in Vero Beach, FL with his wife and four dogs when he's not traveling. In his spare time, Chuck is working on his second career as an aging rock musician.

    Warning: do not ever buy him a drink when there is a piano nearby.

    Note: these are my personal views, and aren't reviewed or approved by my employer.
Enter your Email:
Preview | Powered by FeedBlitz

General Housekeeping

  • Frequency of Updates
    I try and write something new 1-2 times per week; less if I'm travelling, more if I'm in the office. Hopefully you'll find the frequency about right!
  • Comments and Feedback
    All courteous comments welcome. TypePad occasionally puts comments into the spam folder, but I'll fish them out. Thanks!