« EMC Atmos (Maui) Is Here | Main | PowerGard For Storage? »

November 11, 2008


Barry Whyte

I have to agree in the most part.

We have a saying over here (maybe you do too)... "the proof of the pudding is in the eating..."

The problem I see with raw "flash as disk" is the user has to decide which one or two applications to put on them. Since afterall, they ain't cheap.

The problem with "flash as cache" is that - if it really was that simple we'd all be putting TB of cache in our controllers today.

Somewhere in between, now thats interesting ...

Chuck Hollis

Barry -- yes, very interesting indeed!

David Flynn

"Flash is not as fast as DRAM, so the implication is that there will be a level of logic that decides which data lives in DRAM, and which lives in NAND. Those algorithms are important to get the most out of any caching scheme. Whether this function lives on the server, or lives on the array -- these algorithms are not off-the-shelf technology."


Great discussion, I wanted to point out a couple of related things, specifically to the quote above.

You are absolutely correct, NAND is not as fast as DRAM. It has much higher latency, and the bandwidth per-chip is much lower. However, the second of these two issues - bandwidth - can be compensated for through parallelism.

For example, an array of NAND chips can have bandwidth that's not far off from that of a DRAM memory module - roughly a gigabyte per second when it's done right. Next generation can double that at roughly 2 Gbytes/s. And, if it's on PCIe, which chipsets have tons of, it can scales across multiple modules the same as DRAM does in it's multiple slots. For example we get 6GB/s from a hand full of our PCIe modules.

Latency, on the other hand, cannot be "fixed" by parallelism. However, in a caching scheme, the latency differential between two tiers is compensated for by choice of the correct access size. While DRAM is accessed in cache lines (32 bytes if I remember correctly), something that runs at 100 times higher latency would need to be accessed in chunks 100 times larger (say around 4KB).

Curiously enough, the demand page loading virtual memory systems that were designed into OS's decades ago does indeed use 4KB pages. That's because it was designed in a day when memory and disk were only about a factor of 100 off in access latency - right where NAND and DRAM are today.

The VM paging subsystems in OS's today do use very sophisticated schemes (MRU/LRU look-ahead, etc.) for determining what should be paged in / out. A lot more focus has been put on this lately as HDD's have steadily become "further away" from DRAM over time. Take for a case in point Vista's ready boot work, which was all about making those heuristics "smarter", so users aren't waiting around for the application to page back in.

There's another important trend that affects this discussion - applications are increasingly being designed to take advantage of multiple CPU cores through multi-threading. What that means is that, while one thread is waiting for a 4K page to come in, other threads can proceed - thus keeping the CPU's busy at all times by the naturally occurring staggering of page loads. This can totally mask the higher access time to NAND. Then it's only a question of bandwidth and miss ratio...

In other words, if I have enough bandwidth to handle say a 25% miss ratio in my NAND tier, and if my application is multi-threaded enough, NAND will appear just as "fast" as the DRAM, because the mechanisms do already exist in OS's today - and are actually not that far from being well tuned.

Don't know if this changes your thinking at all about the problem, but I appreciate the chance to participate in the discussion - feedback welcome.

David Flynn
CTO, Fusion-io

Sudhir Brahma

Are drive manufacturers looking at EFD too? The hot-spots can be moved to these (like they do with existing onboard cache) but with reduced risks arising from power outages. Ditto for RAID controller HBAs and reduced headaches with cache battery.

Chuck Hollis

Yes, some of the drive manufacturers are looking at hybrid devices.

I think the interesting part will be the relative effectiveness regarding progressively larger "domains of optimization" -- are we optimizing over a single drive, a group of LUNs, an entire array, a group of arrays, multiple data centers, and so forth.

And, not to push it, but one useful way of thinking about Atmos is "optimizing over a really, really big domain".

Flash Designer

I came to that blog thinking that you are speking about flash design. How I can see it is devoted to flash as a way to store information mostly.

The comments to this entry are closed.

Chuck Hollis

  • Chuck Hollis
    SVP, Oracle Converged Infrastructure Systems

    Chuck now works for Oracle, and is now deeply embroiled in IT infrastructure.

    Previously, he was with VMware for 2 years, and EMC for 18 years before that, most of them great.

    He enjoys speaking to customer and industry audiences about a variety of technology topics, and -- of course -- enjoys blogging.

    Chuck lives in Vero Beach, FL with his wife and four dogs when he's not traveling. In his spare time, Chuck is working on his second career as an aging rock musician.

    Warning: do not ever buy him a drink when there is a piano nearby.

    Note: these are my personal views, and aren't reviewed or approved by my employer.
Enter your Email:
Preview | Powered by FeedBlitz

General Housekeeping

  • Frequency of Updates
    I try and write something new 1-2 times per week; less if I'm travelling, more if I'm in the office. Hopefully you'll find the frequency about right!
  • Comments and Feedback
    All courteous comments welcome. TypePad occasionally puts comments into the spam folder, but I'll fish them out. Thanks!