In the US, every car sold has a standardized EPA rating on fuel economy. Using a quaint measurement system of "miles per gallon", it's not precise, but it does give buyers a rough measure of comparative fuel efficiency.
And, of course, has given rise to the frequent disclaimer that "your mileage may vary".
Storage arrays may have a roughly analagous measure as well: usable capacity vs. raw capacity.
And is it time to start comparing the capacity efficiency of storage arrays the way we do cars?
Update Feb 17 2009:
While the specific conclusions reached in this blog post are now obsolete due to enhancements by the respective vendors, the general topic of storage efficiency is not obsolete.
I encourage all storage customers to take the time and effort to figure out what their useable capacity might be -- once all overheads are subtracted.
The differences are still significant, although getting to an accurate figure will take some effort, as can be seen by this post and its comments.]
Why Is This Important?
My impression is that in the US, when gasoline was $1.25 a gallon, not too many people paid attention to that efficiency rating. Spike gas to $4 a gallon, all of the sudden that EPA rating was very important indeed.
In the storage world, we have the luxury of constantly declining media prices, but an industry average growth in capacity that far exceeds the decline in raw costs. As a result, most organizations spend more on storage every year.
It's a fair question ...
How much raw capacity are you buying?
And how much of that do you get to actually use to store your data, once all the overheads are accounted for?
Creating A Standardized Measure
When it comes to fuel efficiency in the US, we have the benefit of a government-mandated standard for comparison. When it comes to storage, we have no such luxury, so we have to create one.
The proposal I'd offer is a like-for-like comparison as follows:
- a relatively standardized use case that most vendors document with specific recommendations (e.g. Microsoft Exchange)
- a decent number of usable disks, say, 120
- an idealized 146GB disk capacity
- configurations done in accordance with vendor-supplied recommendations for Microsoft Exchange environments
Now, the prerequisite disclaimers, before people start to flame me:
- Yes, every use case is different, but we have to pick one, and Exchange seems like a reasonable proxy of an application that most people have in their environments.
Although many vendors don't publish recommendations for other high transactional-rate applications such as Oracle, SQLsever, SAP etc. (EMC does, though) I think it's reasonable to extend Exchange findings to these use cases.
Conversely, I don't think it's reasonable to extend this sort of comparison to file serving, backup-to-disk, decision support and other applications with decidedly different profiles. Performance and application availability matter in the use cases we're targeting for this exercise.
- Yes, efficiency ratios play out differently in smaller configurations (say 10 or 20 disks) or larger configurations (say 500 or 1000 disks). Every mid-tier array has its optimum configuration points where the numbers play out better than others.
We didn't try to game this. We didn’t need to.
- Yes, disks are available in many different sizes, but the real issue here is spindles -- the same efficiency ratios play out whether we're talking 146GB disks or 1TB disks.
- Yes, vendor recommendations change all the time, but are usually a compromise between decent performance, decent availability, decent protection and decent management. Just like EPA gas mileage and individual driving styles, you're free to vary from these recommendations, but not without compromising something else.
And we don't want to have to resort to the storage equivalent of "hypermiling".
Where possible, we've provided links to vendor-supplied documentation. In some cases, these documents have since been removed from public view, so we'd recommend contacting your vendor if you'd like the most -- ahem -- updated version.
We did the best we could. If we got something wrong, let us know where we went wrong, and we'll fix it to the best of our abilities. Our goal is to eventually publish a series of white papers and tools that will help customers figure out the comparative storage capacity efficiency for themselves.
So, consider this a sort of preview of future materials.
As a starting point, we're going to look at a best-practices efficiency comparison for EMC's recent CX4, NetApp's FAS series, and HP's EVA. All are offered by their vendors as mid-tier arrays that support these sorts of environments.
Once the efficiency ratios are calculated, they can be expressed in two ways: one way is in terms of "percent efficiency" of raw vs. usable capacity (ranging from 34% to 70% in this exercise), but it's also useful to express this in terms of price deltas, e.g. because a given array is less capacity efficient, you end up paying much, much more for each unit of usable capacity.
For those of you who are keeping an eye on rack space, we've included that as well. We haven't yet included energy factors (power and cooling), but we'd like to do that in the future.
Ready to dive in? I think you'll find it interesting.
EMC CX4 -- 70% Storage Capacity Efficiency
As far as arrays in this category go, the CX4 is near the top of practical storage capacity efficiency without compromising performance, availability and management. Sure, there's some overhead (as we'll see), but -- compared to many alternatives -- it looks very attractive.
Hot spares -- EMC recommends setting aside 1 disk in 30 as a hot spare. Hot spares speed recovery of failed disks and provide an extra measure of availability. The CX4 uses a global hot spare scheme, which means that a small number of hot spares can protect a much larger number of production drives.
Snapshots -- EMC best practices call for reserving 10 to 20% of capacity as a snapshot reserve. If you run out of snapshot reserve, and more can't be dynamically added, the snapshot fails, and not the application itself.
RAID Parity -- EMC recommends RAID 5 for Exchange environments, configured as 8+1 (8 data disks, one parity disk) which means 15 total RAID groups and a total of 120 usable drives.
Overhead -- All arrays use differing amounts of raw capacity for internal management features. A portion of the first five drives is used to create a vault for storing both the FLARE code as well as a safety area for the contents of cache -- the remainder of these drives can be used for available capacity.
In addition, all CLARiiONs store data in 520-byte sectors rather than 512. The extra 8 bytes is used to provide an additional layer of data integrity, further reducing available capacity by 1.5%. Not all vendors offer this additional protection against data corruption. More on this here.
Running the numbers, we see that a CX4 offers 70% usable capacity when configured with RAID 5 and 10% snap reserve, per EMC recommendations. Choosing RAID 6 instead of RAID 5 decreases this to 65%. Electing a 20% snap reserve decreases efficiency to 65% for RAID 5, and 61% for RAID 6.
For those of you counting physical space as well, this results in 12 drive shelves.
The chart appears below.
Source: EMC CLARiiON Best Practices for Fibre Channel Storage: FLARE Release 26 Firmware Update -- Best Practices Planning
HP EVA – 47% Storage Capacity Efficiency
The EVA provides a very wide number of options in balancing performance, usable capacity and availability. Unlike other arrays such as the CX4, once these choices are made, changing them can be very disruptive.
The EVA configuration choices include:
- Number of disk groups
- Number of proactive disk management events
- Type and number of disks in a group
- VRAID level (O,1 and 5)
- Disk failure protection level (none, single, double)
- Cache settings
The EVA is slightly unusual in terms of how you think about disk overhead.
First, the EVA is built around the concept of "disk groups". HP recommends that separate disk groups be used to isolate performance characteristics. The more distinct high I/O applications you put on an EVA, the more disk groups. For certain cases like Exchange and Oracle, HP recommends that data and logs be separated to different disk groups.
Given that most arrays do other things in addition to just Exchange, we've assumed (like the CX4 above) that the EVA with over 120 disks will be perhaps be supporting things like SQLserver, Oracle and other high I/O applications.
We've decided to use 7 disk groups for this example: two for Exchange (logs and data), three disk groups for Oracle (two applications each requiring a separate disk group but a shared log), one for a SQL database (perhaps sharing log file with the Oracle log disk group), and - finally -- a disk group reserved for snapshot images with decent performance (e.g. MirrorClones)
Even though HP recommends Vraid1 for Exchange, we have elected to configure Vraid5 to maintain a decent comparison with the CX4 configuration above. Also in the spirit of fair play, HP recommends a 20% snap reserve for Exchange. We've elected to make this 10% to maintain a rough comparison with the CX4.
Hot Spares -- The hot sparing scheme is somewhat unique on the EVA -- there's no concept of a global hot spare. Hot spares are associated with individual disk groups. And, since everything is "virtual", HP recommends required-case hot spare provisioning in the event that a user later elects to use, say, Vraid1 instead of Vraid5.
This means that the concept of a "virtual" hot spare has to be twice whatever the largest disk in the group might be, and you need one of these per disk group. For example, if 146 GB drives are used, the virtual hot spare area is 2x146 GB. If a single member of the disk group is, say, a 450GB drive, the virtual hot spare area must be 900GB.
There's an additional level of hot sparing per disk group as well, "Proactive Disk Management", which plays roughly the same role a second hot spare would in a global design. Unfortunately, this too is associated per disk group, and must be twice the size of the largest disk in the group. Each "Proactive Disk Management" virtual spare protects against a single "event" (e.g. disk failure) in a disk group. If you want protection against more than one event, you'll need more of these. We configured to protect against a single event here.
HP folks, if we got this wrong, our apologies in advance. You'll have to admit, it's a pretty intricate scheme.
If we go back to our configuration example, this would be a total of 14 virtual hot spares and 14 virtual proactive disk management spares would be required for 7 disk groups. Fewer disk groups would require fewer virtual hot spares. No mention is made of potentially using larger capacity drives solely for the purpose of virtual disk hot spares, but I would assume that this would make for more difficult administration.
Snapshots -- This analysis uses HP EVA's "Virtually Capacity Free" snapshots, which require close to the same amount of capacity as a CX4 would. It should be noted that CX4 snapshots reside on separate disks to minimize production performance impact.
For the EVA to match this, HP must use "fully allocated" snapshots which would further impact usable percentages more than what is shown here. Again, in the spirit of fair play, we’ve given the EVA the benefit of the doubt on this one.
RAID Parity -- The EVA offers two parity RAID choices: 4+1 Vraid5 and 4+4 Vraid1. For consistency, this study uses Vraid5, even though HP recommends the less space-efficient Vraid1 for many performance-intensive environments.
As a result, for there to be 120 data disks, we'll need another 30 parity disks -- plus protection for the other overhead drives.
Overhead -- EVA makes five copies of its OS and configuration data which it distributes among the first five disk groups. At least one of the disk groups must be reserved exclusively for log files, although the smallest disk group supported is eight drives. Interestingly enough, no single log file exceeds more than one disk, meaning that seven of the disks are unusable, probably for performance reasons.
Finally, to allow EVA to complete its housekeeping in a reasonable time, EVA best practice recommends a further 10% of each disk group be given to the EVA operating system, known as "Occupancy Alarm".
Even with giving the EVA the benefit of the doubt in several areas, we still arrive at a 47% usability factor.
I think we're being generous here, though. Anecdotally, we get routinely exposed to EVA customers who have much lower usable capacity basd on specific HP recommendations -- sometimes as low as 33%.
For those of you counting physical space, this is 17 drive shelves.
Here's the chart.
But there's more to this than just efficiency.
If you assume that, for example, that CX4 and EVA raw capacity is priced roughly the same, that means that every unit of usable EVA capacity is roughly 63% more expensive than the same capacity on a CX4.
And that's without factoring things like power, cooling and floor space.
Sobering, isn't it?
NetApp FAS Series -- 34% Storage Capacity Efficiency
Although the FAS series can be used in block-oriented environments such as Exchange, it does so by emulating a block storage device on top of its underlying WAFL file system. WAFL only performs well when it is guaranteed that free blocks are always available for new writes and snapshot data.
When it is used in this manner in high change rate environments (such as Exchange) and snapshots, NetApp often recommends significant amounts of storage overhead to ensure reliable operation and acceptable performance. This is often called "space reservation" or "fractional reserve".
Hot Spares -- For the FAS series, every disk that is not being used can be considered a potential hot spare. NetApp best practices state that each file head should have a minimum of two hot spares assigned up to 100 disks, then two additional hot spares for every 84 additional disks.
For a 364 disk configuration, that means 2 for the first 100 drives, and 8 disks for the remainder.
Source: NetApp's Storage Best Practices and Resiliency Guide
RAID Parity -- FAS supports two parity RAID modes, RAID 4 or RAID-DP (RAID 6). NetApp's default is RAID 6 for these environments, organized as a 14+2 scheme. In our 364 disk configuration, this rounds up to 23 RAID groups, or a total of 46 parity drives.
Snapshots -- As mentioned above, WAFL doesn't want to run out of space when using snapshots. The question of precise figures required for an Exchange environment seem to be a subject of controversy both inside and outside of NetApp -- some documents point to a 100% reserve space recommendation, others suggest that it's reasonable to get by with a lesser amount.
One thing is extremely clear -- running out of snap reserve looks to be a very bad thing in a NetApp environment -- there's no place to put an incoming write, usually resulting in catastrophic application failure. By comparison, other approaches (e.g. CX4 and EVA) simply stop trying to capture before-write data if you run out of reserve -- the application itself continues to run.
From the Netapp "Microsoft Exchange Server 2007 Best Practices Guide", Fractional Space Reservation
"It is recommended to have a 100% space reservation value set for volumes hosting LUNs containing Exchange data. This guarantees sufficient space for all write operations on Exchange data volumes and ensures zero donwtime for Exchange users".
And "Block Management with Data ONTAP 7G: FlexVol, FlexClone, and Space Guarantee"
"It is extremely important to keep in mind that a change in fractional reserve percentage might result in the failure of a write operation should the changine in that LUN exceed that percentage (frequent monitoring of space is recommended). Therefore, one must be sure of the change rate of the data in the LUN before changing this option. As a best practice, it would be best to leave it at 100% for a time".
I guess you've been warned ...
I would assume that this same sort of recommendation would apply to any write-intensive application: SAP production instances, Oracle transactional applications, and so on.
Overhead -- WAFL has a high amount of space overhead, which comes in handy for a variety of reasons, but comes at a price.
First, WAFL must "right-size" (format) all drives to the same lowest-common-capacity denominator if drives come from different vendors.
Second, since WAFL is a file system there's overhead for the metadata that all file systems require.
Thirdly, the WAFL design wants 1 in 10 blocks to be free so that application writes aren't delayed, this means an additional 10% of reserve at all times.
And, finally, there's another allocation for "Core Dump Reserve" should NetApp customer support want to take a look at an ONTAP core dump.
Here's the chart:
This means that the NetApp FAS series achieves 34% storage efficiency as compared to CX4 for these configurations. If you're counting rack space, this results in 26 disk shelves.
And, for this use case, this means that a given unit of usable capacity on NetApp's FAS is approximately 2x expensive than a more capacity efficient device, such as a CX4 for these kinds of use cases.
And again, that's not counting power, cooling and space.
Maybe you should ask them for a better discount :-)
So, What Does All Of This Mean?
First, I think there's enough variability in capacity efficiency that -- just perhaps -- we ought to start educating storage users on the difference between raw and usable capacities in real-world use cases.
If roughly the same usable capacity costs X from one vendor, and 1.6x from another vendor, and 2x from yet another vendor -- isn't that significant?
Second, as every IT organization takes a sharper look at power and cooling, the numbers get even bigger, don't they?
Even assuming all the associated controller and enclosure electronics were roughly equivalent in terms of power and cooling (definitely not the case, but assume it is just to simplify the discussion), doesn't it matter that a given usable capacity has X power/cooling requirement from one vendor, 1.6x from another vendor, and 2x from yet another vendor?
Third, I think this storage capacity efficiency discussion is something that other vendors don't really want to talk about. But I believe that customers will want to start getting into the practice of requesting quotes for usable capacity, configured in accordance with vendors' published recommendations for their environments -- in addition to asking for power/cooling requirements.
My guess is that many vendors will argue that (a) they have features and benefits that offset their inefficient use of space, (b) we're wrong in our analysis, or (c) this really doesn't really matter.
Just to head off the obvious, if there's a claim that your thin/virtual provisioning feature saves storage, or your dedupe feature saves storage, or whatever it is, please point us to where you recommend that customers use these features with performance-intensive and availability-intensive applications.
Regarding the second point, if you think we're wrong on some substantive point, please accept our apologies in advance -- and point us to the correct answer. As long as we're staying within the use case guidelines outlined above -- we'll glad to make the change.
But please, don't try and argue that this isn't an important discussion for customers ...
References, if you're interested, for HP stuff:
HP StorageWorks 8000 Enterprise Virtual Array and Microsoft Exchange Server 2003 – storage performance and configuration, EVA BP for Exchange 2003: http://h71028.www7.hp.com/enterprise/downloads/PerformConfig_Exchange2003_EVA8000_1105.pdf page 18 & 19.
This does not discuss in detail the additional disk group(s) for backup - presumably on FATA drives but does mention minimally 2 disk groups (w/o backup) and more for isolation or performance.
Replicating Microsoft Exchange 2003 over distance with the HP StorageWorks Enterprise Virtual Array and HP StorageWorks Continuous Access white paper: http://h20219.www2.hp.com/ERC/downloads/5983-1550EN.pdf page 92 (shows 4 disk groups)
From HP StorageWorks 4x00/6x00/8x00 Enterprise Virtual Array Configuration Best Practices White Paper http://h71028.www7.hp.com/ERC/downloads/4AA0-2787ENW.pdf:
"Proper settings for the protection level, occupancy alarm, and available free space will provide the resources for the array to respond to capacity-related faults.
"Free space, the capacity that is not allocated to a virtual disk, is used by the EVA controller for multiple purposes. Although the array is designed to operate fully allocated, functions like snapshot,reconstruction, leveling, remote replication, and disk management either require or work more efficiently with additional free space."
"Additional reserved free space — as managed by the occupancy alarm and the total virtual disk capacity — affect leveling, remote replication, local replication, and proactive disk management.“ Figure 3 gives a great picture of things. Note that in our calculation we separated PDM and Capacity Alarm. This was to keep things understandable rather than lumping all the factors into Capacity Alarm. We chose ONE PDM event although this paper suggests two is also a good choice. For Capacity Alarm we chose a conservative 10% to include all items beyond one PDM event including replication log file, releveling overhead, etc.
This is the key one that HP took down: ftp://ftp.compaq.com/pub/products/storageworks/whitepapers/MSEx2003designEVA5000.pdf.
It showed 7 disk groups as a best practice for an Exchange 2003 configuration to support 8000 mailboxes.
And some NetApp (and IBM N Series) references to go look at:
Trading Capacity for Data Protection – A Guide to Capacity Overhead
.5% additional FlexVol reservation
· OnTap 7.3 Release Notes (p46)
Drive right sizing
· Data ONTAP 7.3 Storage Management Guide (p31)
100% Space Reservation in block environments
· Data ONTAP 7.3 Block Access Management Guide for iSCSI and FCP (p38)
Warning -- if some of these docs disappear after we post this, our apologies.