« Corporate Social Media -- Lessons Learned | Main | Information Infrastructure for DW and BI »

February 06, 2008

Comments

Richard

Chuck,
So...how is this different from pNFS?

Dan Pancamo

You right no one knows about MPFS... First time I've heard of it :(

MPFS scales by adding clients, however it appears each client is still limited to less than 1Gbps... Sounds like every other clustered solution out there...

So why haven't you guys added an MPFSi agent to ESX?

MPFSi and 10GbE on ESX would be very interesting.

Actually I just realized why we have a NSX in the closet... TOO COMPLICATED. Make it simple with 1 pretty gui and add all the "snap" features...

Why is VMware so popular? It's simple! Even I can install ESX in minutes. Make a storage system fast, redundant and SIMPLE and you have a product.

Sajeev

>Sometimes I chuckle at the NAS vs. SAN vs. iSCSI debate.

>Wouldn't it be cool if you had a single, >integrated environment that could use them >all, each to their strengths? Cost, >performance, management, protection, etc.?

I thought Netapp filers did this all the time. Good that EMC too is thinking on that line :-)

Chuck Hollis

Hi Richard

On the client side, it's really the same thing. We see pNFS evolving to the preferred client for MPFS environments, but -- like most things -- it's a long train coming, and not everyone can wait.

However, it's fair to point out that pNFS addresses the client issues (not the backend storage, etc.) so all of us vendors will have to come up with something special to support what pNFS does. As we've done.

Finally, in my mind, pNFS will likely be a direct replacement for the current MPFS client that we're shipping today.

Chuck Hollis

Hi Dan -- interesting thoughts, to be sure.

First, MPFS doesn't scale by adding clients. The bandwidth delivered is independent of the number of clients, and is more dependent on the storage subsystem, so I think your mental model isn't 100% accurate. And, trust me, it is different than every other clustered doo-hickey out there. Whether it's better or not is your call, not mine!

You're right on regarding the ESX comment. We're working to do just that, but it's not as easy as it sounds for a variety of reasons that I don't want to get into.

However, we're finding that most current ESX deployments run just fine on NAS, iSCSI, etc. and can't really justify this sort of honkin' performance. Yet.

For me, the arrival of FCOE as part of a broader DCE architecture will make this sort of thing even more interesting, e.g. one wire that does it all.

Finally, I'd suggest that you'd take a look at a current Celerra. The guys got the simplicity message loud and clear a while back, and we've got videos of 9 year olds as well as marketing VPs setting the thing up in 15 minutes or less. Ditto for snaps, etc.

And, yes, even NetApp customers have told us we've done them one better in this regard. You be the judge!

Thanks for writing!

Chuck Hollis

Hi Sajeev, yes and no.

Both EMC and NetApp offer storage units that give customers flexibility of choice, e.g. choose a bit of iSCSI, a bit of NAS, or a bit of FC. Nothing unique here, right?

I don't think NetApp would attempt anything like MPFS that uses NAS for what it's good at (management, metadata, file sharing) and hybridizes SAN (FC, iSCSI) for high-speed data delivery to the same clients.

That's kinda a different deal, I'd argue.

Thanks for writing!

shaun noll

"The guys got the simplicity message loud and clear a while back, and we've got videos of 9 year olds as well as marketing VPs setting the thing up in 15 minutes or less. Ditto for snaps, etc."


hahahah.... comparing the smarts of marketing VPs to 9 year olds....


that is hilarious. Chuck, would you mind posting one of those 9 year old kid sets up a SAN videos? maybe its already on here but i didn't see it, would be cool to see it.

mgbrit

I remember MPFS from 2001 when EMC first discussed the technology.
I have to think that 7 years later that there are just 100 customers on this, it's due to the enormous cost of this type of implementation.

Proprietary NAS front end, burdened cost of FC back-end. This may have changed, but as I recall, this was qualified for Symmetrix and not lower cost per TB Clariion. Customers have to invest not only in a high-performance low-latency IP network but also the fibre-channel fabric.

I was actually thinking that MPFS would largely die once 10Gb Ethernet arrived. Now you can have your file system AND the performance that you require to go with it. 10Gb FCoE may offer even greater advantages.

As the adoption of 10Gb grows and the price per port comes down, this becomes an even more interesting scenario.

Chuck Hollis

Good comments, as always, mgbrit!

You're right -- at the outset, it was an expensive and complex proposition, but not any more. I tried to highlight that the shape of the adoption curve has dramatically shifted recently, due to lower costs, improved simplicity, and new customer requirements. Really, really shifted.

The entry costs of a Celerra have come way down, and it's much, much easier to use than before. Yep, it's proprietary, but there's nothing in the commodity world (yet) that does this sort of thing, so that's the tradeoff.

FC doesn't have to be part of the equation, though. iSCSI seems to offer better latency / response time performance than NAS protocols, so there's definitely a segment of customers who'd be better off with this approach without having to invest in FC.

The CLARiiON stuff has been qualified for a while -- a very, very long while.

I'd agree with the 10Gb FCoE observation in spades. A converged wire, supporting multiple protocols, each doing what they're good at -- well, that sounds pretty cool to me.

Thanks for writing!

Barry Whyte

Sounds like another vendor lockin to me (nobody wants all those proprietary clients) - maybe thats why only 100 in 6 years... 17 a year... must still cost a pair packet to keep the development running that long...

Chuck Hollis

Hi BarryW, how's the IBM blogging going?

What does IBM have to offer in this space (other than resold products from NetApp?)

I seem to remember something called "Storage Tank" from a while back ... is that still an active initiative at IBM?

Cheers!

Charlie Lavacchia

First, a disclaimer - I work for NetApp and the comments below reflect my personal view and NOT that of NetApp.

Here's a reality check, Chuck - every customer I talk with is trying to reduce the complexity of their IT environment, not add to it.

You've blogged extensively on the subject. So why the sudden embrace of more complexity?

Hopefully it's intuitive that simultaneously managing and coordinating the deployment, change management and troubleshooting of parallel SAN and NAS interfaces ADDS complexity to the IT environment, not reduces it. The prospect of managing zoning and access controls, auditing, security, etc. for a proprietary product will keep the data center managers and a most CIOs I talk with awake at night.

Chuck Hollis

Hi Charlie

With all due respect, I think you're missing the point a bit.

Yes, people want simplicity, and energy efficiency, and security, and something that's cheaper, and shorter workweeks ... well, fill in the list.

Just because I'm talking about a high performance option to NAS that some might find interesting doesn't mean we've solved world hunger here.

And, well, I've met more than a few people who are willing to take on a bit more complexity to solve a pressing business problem.

Like getting a ton more performance out of their NAS environment.

Does NetApp have anything that's architecturally equivalent that offers a simpler management model?

Lars Albinsson

Hi Chuck

Are you taking enhancement requests :-)

We are about to implement the RSA enVision using NAS for it's storage, this because the RSA solution need to share the storage between 3 nodes over CIFS (Celerra and Netapp's are supported). So we now bought a NS40 that we are connecting to a DMX3.

Here's the enhancement request, If we where able to use MPFS we off load the NS40 and can us it for NFS/CIFS/iSCSI services and also boost the IO performance for the RSA enVision application... But it's not supported, do you have good contacts with the RSA devision....

Thanks for writing!

ming zhang

just curious. after checking out http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1158962,00.html and http://searchstorage.techtarget.com/news/article/0,289142,sid5_gci1240230,00.html
and similar other post. looks like EMC gain some deals by partnering with ibrix. and ibrix fs looks pretty much like the mpfs. so which one is better?

Sorin Faibish

Some people think that the name MPFS is named after the managers at the time of the product: Mark and Percy File System.

Not named by engineers but after engineers.

sorin

I want to address the only 1 Gigabit pipe limitation. Today you can find 4 Gigabit ports cards and with a good old PowerPath one can increase the speeds to 300-400 MB/sec per client with MPFS product. but why stop there, you can use 10Gbit or even Infiniband and you can scale up to 900MB/sec for a single client (actually we measured this in the lab and got 910MB/sec using IB). But the real question is what would anybody do with these huge performance numbers. Yes it is nice to know it can be done but it is more important to feed a large number of hosts at 100-117MB/sec than a single huge host. This is my personal view, but maybe I am wrong.

Chuck Hollis

Hi Lars

I take enhancement requests, but all I can do is pass them along to the product groups.

In this case, I believe that enVision can use NAS storage just fine -- ours and others. Depending on your appetite for adventure, everyone is telling me that it should "just work". Now, getting a bunch of engineers to do the work to make sure that's absolutely the case all the time, well, I'm sure that's going to take some time.

I'll let you know if I hear anything back from the enVision team. Now that you mention it, that's a pretty good use case for MPFS.

Chuck Hollis

Wow -- a 1 gigabyte per second per client data feed. Gee, that's a nice round number, isn't it?

Not that anyone needs that, but it's nice to know it's there if you want ;-)

Any know of anything faster (NAS-based, not using DRAM etc. for storage)?

Chuck Hollis

Hi Sorin

Percy T reminds me that the original name of the product was "HighRoad", and later it was some marketing dweeb (maybe me!) that changed the name to MPFS.

Chuck Hollis

Hi Ming

You're right, we do a lot of business with Ibrix -- they're a great partner.

I don't think in terms of "better", I tend to think in terms of "different".

Ibrix uses a clustered server approach to build a scale-out file system. MPFS uses a combined NAS/SAN hybridized approach.

I am not qualified to offer a semi-intelligent opinion as to which might be better in your enviroment. I don't know how serious you might be in answering this question, but talking to an EMC NAS specialist might be helpful to walk you through the pros and cons of each.

Sorin Faibish

Chuck,

The initial/internal name was always MPFS. It was changed to HighRoad to match the competiting product names like: SANergy, CentraVision, StoreNext. Later marketing revisited the MPFS name.

William Cleek

Great blog entry and responses. I will not claim to have more experience than anyone here, but there are some important interconnect and array performance concepts to remember when considering this solution.

1. Multi-tier load distribution.
MPFS is great for load distribution because it scales linearly. That implies that for each array element you gain close to 100% performance increase of that element assuming ideal distribution of data and IO requests. Ok, ideal data and IO request distributions are HUGE assumptions. This is why we have to think of distribution at all levels / tiers; at clients, at interconnects, at storage head ends, at intra-array levels. Implementing distribution within one tier while neglecting another is a sure way to get unexpected, IMO negative, results. An example of distribution techniques at each of these levels are MPFS and it's total architecture which addresses client and storage head end distribution. Multiple interconnect elements including NICS/HBA's and their links, FC/Gig/10Gig switches to lay the foundation for IO path distribution. Intra-array element aware IO distribution software such as Powerpath to leverage multiple interconnect and intra-array paths to a single target. Finally, and the part that most Clariion users miss since it's not an inherent part of it's architecture, is intra-array distribution of data. On the Clariion platform a vanilla configuration will almost ensure an unbalanced use of array spindles. In other words you'll likely never see the IO that the platform is rated for. Implementing a MetaLUN matrix that evenly, in respect to LUN capacity, distributes each LUN across as many spindles as possible should be done to realize the most performance regardless of questions like what client uses what Raid Group. MetaLUN matrices make that question largely irrelevant. Some make arguments that would detract from the value of MetLUN matrices. The argument is that by distributing LUN's in this fashion the performance of every LUN is diminished. This argument supports the data isolation technique for solving performance problems. My answer to this is that one must understand IO requirements, build solutions to meet those requirements and when IO resources are exhausted one adds a new array. MPFS supports my argument nicely by making incremental scaling a breeze (while turning a blind eye to Inter-array data redistribution after scaling, which is another can of worms).

2. Interconnect theoretical capacity vs. real capacity.
This is the real gotcha that many glaze over, forget or are completely ignorant about. Especially when talking about Ethernet in Eth flavors starting with 100 through 10G. The theoretical throughput is different than the practical throughput based on administrative configurations. This administrative configuration that makes the difference between realizing as little as 5% of theoretical throughput in the case of GigE is the Ethernet Frame Size. The reason for this is that the speed in which data can be put on the wire scales with the Frame Size. One can confirm this by doing a bit of research on the combined use of iSCSI and Ethernet Jumbo Frames. This idea is fairly straightforward, but the per host realized capacity problem with Ethernet is irreconcilable when 10G Ethernet is considered. The sad fact about 10G Ethernet, depending on one's goals, is that the Ethernet Frame Size cannot scale large enough for a single host to realize anywhere close to 10G's capacity. This leaves 10G relegated, or only effective for ignorant implementations, to aggregation links. Multiple frame streams from multiple hosts being aggregated on 10G links can realize it's capacity. 10G doesn't solve any per host throughput problems. Even I at one time wanted to throw 10G at each host, but the looks and advice from our Cisco SE's set me straight on this.


A dedicated 1000/10G Ethernet interconnect with properly configured Ethernet on the hosts combined with iSCSI and MPFS would be a rocking solution for both performance and resiliency.

I only wish EMC would hire me as an SE so I could do these implementations just to see the looks on customer's faces when they see the performance stats.


Jeff Darcy

As one of the project co-originators, compiler of the first "rainbow document" FMP spec, and developer of the first functioning client on Solaris, perhaps I can clear up some of the naming.

The first name was Parallel NFS,. Mostly because it also supported CIFS, it got renamed pretty early to Multi Path File System, and it remained that for most of its development.

This is the first time I've ever heard that the name was after Mark K (who in fact wasn't even there at the beginning with no other Marks in sight that I recall) and Percy T.

Given the politics of the project, I doubt that any such naming scheme would have given any credit at all to the Cambridge folks, so it probably would have ended up as UUXFS after the three people in Hopkinton who were most involved.

In any case, MPFS was not considered a go-to-market name. Close to the first release, marketing came up with HighRoad. None of the people actually working on the project liked it, but nobody was listening to us and that's what it ended up being.

I've actually been rather amused to see the term PNFS come back for something related, and now it seems that MPFS is back too. It's also gratifying to see that it's finally getting some traction in the market. At the time that I left, the sales force still seemed unaware of its very existence.

Kris

sounds very interesting.

so where can i buy a 2 disk or 4 disk system like that.
and if its not available - why not ?
the celerra ns40 is only for VERY big companies.
i think this could be a screamer in the small office segment.

Chuck Hollis

@Kris

Most storage subsystems are spindle-limited in terms of performance. 2 to 4 disks means that's the limiting factor, so not many people are interested in small config / high performance.

The game changes when we move to enterprise flash drives, no? So one could imagine an extremely fast storage subsystem that uses flash to do exactly what you suggest.

Unfortunately, the price of such a device (today) would scare most people away. But I'm guessing that will change sooner than later.

-- Chuck

Taufik Kurniawan

Hi Chuck,
I don't know where should I ask this question.
Then I found your blog. It is very fascinating to read your blog. It is like the ocean of knowledge of EMC, Storage, Virtualization etc.

I have one question which I really need the answer from you as an EMC CTO.

When do you plan to support dual port or quad port of 10G Base-T (Using Rj-45) to be supported in Celerra line of product?

As I am now, designing and planning one big infrastructure framework. I am still stucked in this missing link (my question above).

Should this information is still confidential, kindly send it to my email address at taufik (at) batelco dot com dot bh. Even if I have to sign the NDA with you.

cheers,
taufik kurniawan

Chuck Hollis

Hi Taufk

Not my area of detailed expertise, but I'll track someone down tomorrow and get back to you ASAP -- thanks!

-- Chuck

Chuck Hollis

Taufik -- I'm trying to send you an email at the above address(es), and they keep bouncing.

-- Chuck

Taufik Kurniawan

Dear Chuck,

I received your email. Thanks for your reply and confirmation. It is very valuable information.

I can start working on the infrastructure design based on your information :)

cheers,
taufik kurniawan

Abdellah

Hello Chuck,

I got to this site thru a google search on "ESX support of MPFS".

Very interesting article, but as asked by Dan, It would be really nice if EMC could give us a real state of MPFS driver compatibility with ESX.
The use of VMWare could be explained by the licensing policy suppliers tend to link to numbre of CPU/Core, and for a web application, sometimes 2 core are enough, but performance when delivering content are crucial.

Could you please direct me to one of your experts I could talk to regarding the integration of ESX Farm and MPFS as my project is tightly coupled with the possible adoption of MPFS protocol with our ESX famrs and EMC devices?

Thank's a lot

Chuck Hollis

I will check, and have someone get back to you as soon as possible.

-- Chuck

Sorin Faibish

I will try to address Abdellah question but I will need to know if you are interested in MPFS client running on the ESX server or inside VMs. If inside VMs we already support all VMs running Linux or Windows using the native MPFS client on the given OS. If you refer to using an MPFS client in ESX OS we do not have such a client yet but we work on a prototype in our lab that can become a product if more people like you will ask for such a client to be accepted by VMware. We can take the dialog offline if you need more details.

Edward

Hi All, Heard that EMC MPFS is a sunset technology, wonder if there is any design that can replace the existing MPFS in my environment? We are having SAN storage. Thanks a lot.

Chuck Hollis

Edward

That's a fair statement. On one level, the direct replacement is pNFS 4.1 but there are not a lot of implementations on the market quite yet. More broadly, a lot of the rationale for MPFS (and pNFS) has diminished over the years. The original goal was the convenience of a file-system mgmt model combined with the predictable low-latency of a block storage model -- which is not quite the concern that it used to be for a long list of reasons.

-- Chuck

The comments to this entry are closed.

Chuck Hollis


  • Chuck Hollis
    SVP, Oracle Converged Infrastructure Systems
    @chuckhollis

    Chuck now works for Oracle, and is now deeply embroiled in IT infrastructure.

    Previously, he was with VMware for 2 years, and EMC for 18 years before that, most of them great.

    He enjoys speaking to customer and industry audiences about a variety of technology topics, and -- of course -- enjoys blogging.

    Chuck lives in Vero Beach, FL with his wife and four dogs when he's not traveling. In his spare time, Chuck is working on his second career as an aging rock musician.

    Warning: do not ever buy him a drink when there is a piano nearby.

    Note: these are my personal views, and aren't reviewed or approved by my employer.
Enter your Email:
Preview | Powered by FeedBlitz

General Housekeeping

  • Frequency of Updates
    I try and write something new 1-2 times per week; less if I'm travelling, more if I'm in the office. Hopefully you'll find the frequency about right!
  • Comments and Feedback
    All courteous comments welcome. TypePad occasionally puts comments into the spam folder, but I'll fish them out. Thanks!