I was pleasantly surprised by the vigorous debate kicked off by one of my recent posts "The Future Doesn't Have A File System".
Although most of the vigorous rhetoric came from an individual with a clear vested interest (Alex McDonald who is the competitive blogger over at NetApp), he did bring forth some themes that I'm sure are more widely shared. And, just as vigorously, Paul Carpentier of FilePool (Centera) fame argued the other side of the equation.
I'd like to use this post to step back a bit and lay a bit of groundwork as to why this is such an interesting topic to so many of us looking at architecture.
Consider The Object
Perhaps one of the simplest concepts in computer science, it usually boils down couple of standard components.
- an arbitrary amount of bits (data, code, whatever)
- a unique identifier
- an aribrary amount of metadata to describe it to different parties
- and some sort of access method to invoke or use it
That's about it. It's about as clean an abstraction as you'll find. Application developers are very comfortable with objects (usually procedural code), and assembling application from collections of various objects.
Information architects are starting to get comfortable with the concept as well, but are a bit behind the pace of application developers in terms of appreciating this sort of model.
Stored Information In Objects
Analogies can help here. One we've used for years is simple: parking your car at a really big, crowded event.
You might appreciate the value of valet parking -- hand your car over to someone, and get a ticket. Bring your ticket, get your car -- otherwise, there's a lot of explaining to do.
You don't really know where your car is parked, or if they had to move it around for some reason. You expect the valet service to return your car in the same condition as when you brought it. You expect them to keep your car reasonably secure during the event.
Some valet services go the "extra mile" and offer car detailing or other routine maintenance while you're at your event. Again, you're not involved in the specifics of how that gets done.
Compare that with parking your own car at a crowded event with lots of people, and you start to get an intuitive understanding of object-based information models. Yes, the analogy goes only so far.
Presentation Of Objects Is Your Choice
Now we can go to the next step -- how do you want to see all your information objects? It's pretty easy to show them (if you choose) as a traditional hierarchical filesystem -- or even multiple hierarchies of the exact same information objects -- if you choose.
Those same objects identifiers can be shown relationally in a database view -- if you choose. Or as web objects on the internet -- if you choose. Or as any other reasonable collection or schema. The same information object can be presented in multiple fashions concurrently -- if you choose.
The key point here is simple -- how the information objects are arranged and presented is completely separate from the information object itself -- a very useful property, if you think about it.
Scale And Location -- Not A Problem
We're presuming that in any serious discussion, we'll want to think in terms of billions or trillions of objects, maybe more. Not an inherent problem for an object-based information store, whereas traditional filesystems and databases tend to struggle at these levels.
There's also the fact that information objects will usually want to move their physical location during their lifetime -- not an architectural problem for object-based information stores.
Now, layer in current thinking about private clouds, and the ability to dynamically move workloads (and their information) wherever they need to be -- internally, externally, etc. -- and you can see at least part of the value of a location-independent information store.
Bring your claim ticket, get your information object back -- wherever it might be right now.
Taking a REST
As the world moves to more web-friendly RESTful and stateless protocols for application interaction, it's the most simple and logical extension to take this default interaction model and apply it to information objects as well. The notion of finding, reading, appending and writing files (or traditional database records) seems downright primitive in this context.
The Magic Of Metadata
There's just so much potential in this concept -- the more metadata we associate with an information object, the easier it is to manage, protect, secure, find, etc. Trying to keep all this useful stuff in some sort of external repository or database is a severely limiting approach -- the information and its metadata can get badly out-of-sync.
So much cleaner to simply associate metadata with the object in such a way that they can never get separated, and never get out of sync.
More importantly, it's this metadata that's the foundation for all the cool things we all would like to do with information services around the object: serving it up at the right service level, placing it in the right location at the same time, figuring out how long to keep it or when we can delete it, enforcing security and compliance policies, how it relates to other objects and business processes, etc. etc. etc.
It's a very long list indeed.
Don't Limit Your Thinking Here
A more interesting thought exercise emerges when you shift your perspective from individual information elements to larger composite structures.
For example, a virtual machine (app, OS, files, etc.) can be thought of as an encapsulated information object. A group of LUNs being used by a traditional Oracle database can be thought of as an object. A composite application of virtual applications (a vApp) can be thought of as an object. Instances of supporting infrastructure can be thought of as objects as well.
It's a very extensible paradigm :-)
I Will Concede A Point
File systems -- as we know them -- probably won't go away for a very long time.
But if you're developing a cloud-aware application from the ground up, you'll probably resist the temptation to think in terms of file systems and database records -- that much is clear. Indeed, spend any time in any of the newer development frameworks, and it's very hard indeed to get to a concept of a "file system" from their application abstractions.
So, maybe the future *does* have a file system -- with an object-oriented information store underneath!
Note: have fun at VMworld next week everyone -- EMC's got quite a show planned for all of you!
Chuck,
Good post.
The problem I have with your thesis is the definition of object store vs file system.
It seems you're saying that if the data is not organized in a hierarchical name space, is accessible via web based protocols through a unique identifier then it must be an object store, and has extensible searchable meta data it must be an object store.
You then make the interesting leap that this represents the future of data organization.
So I have to step in and caution your enthusiasm. And let's be clear it's not because I happen to work at a company that sells NAS devices.
The reason the file system metaphor has been so durable is that the organization of the meta-data in a search-able format is a hard problem.
If you need to store billions of objects, and you need to search them, then you need an index. And it turns out that the directory tree happens to be the most efficient mechanism for representing the search-able index. This isn't that surprising. A directory tree allows you to partition the objects you're search for into smaller manageable chunks than can in turn be searched. In the last 40+ years of computing there has yet to be a better way to do things.
Ultimately any object service will need some kind of name lookup service, and that name lookup service to scale to billions of objects will rely on some kind of hierarchical scheme which will end up looking like a directory.
I'll concede that the minute I said X was impossible, 2 minutes later there was someone who demonstrated that it wasn't, but ... I am fairly confident that hierarchical information organization is still necessary.
The only remaining question is whether the namelookup is necessary, whether you can directly access the object without doing a "READDIR". I suspect the answer is going to be yes. Even today, an NFS client does not repeatedly do a LOOKUP once the filehandle is acquired. Instead directly accessing the underlying object (file) directly through a globally unique id.
If the directory as an organization format remains, durable, then what we're arguing is over whether the interface of the future is "open", "read", "write", "close" and whether the meta data is stored within the file system or a bag on the side.
I suspect that file systems will evolve to support extended metadata within the file system, and that adding additional interfaces to manipulate data will also emerge. I expect that in 10 years the NAS device of the future will support protocols that are web based interfaces.
So if I do agree on one point it's that the file system of the future will have additional capabilities, will be accessible using different protocols, but that the central principles of a hierarchical namespace, with the ability to read and write into the namespace, will remain.
cheers,
kostadis
Posted by: kostadis roussos | August 28, 2009 at 08:17 AM
@kostadis
I think you missed the point. It may be the case that hierachichal file systems may search faster in some situations. And we both can think of situations where that would not be the case as well.
Separating object from presentation (or search method, in your comment) still has unique merits. Object identifiers can be organized in traditional hierarchy if needed, a relational database if needed, or any other schema without impinging on object properties.
I disagree with your assertion that "ultimately any object service will need some kind of name lookup service, and that name lookup service to scale to billions of objects will rely on some kind of hierarchical scheme which will end up looking like a directory."
Categorically not true -- many successful counterexamples exist in the industry today.
I'm glad you somewhat agree with my point around metadata. It does not belong in a bag on the side.
Why do we have to wait 10 years for NAS devices to catch up with what's already available and proven in the market? That doesn't sound right, does it?
Thanks for writing ...
-- Chuck
Posted by: Chuck Hollis | August 28, 2009 at 08:54 AM
This is from Chuck
I've been watching Val over at NetApp blustering on about clouds, and I've resisted the temptation to "help educate" him on what is cloud and what is not-cloud.
Given that his motivations seem to be marketing-oriented rather than having an intelligent discussion, I'll leave him to his new role in marketing.
However, I do believe that it would be useful to offer up the observation that, in many cloud models, traditional filesystems strike many of us as woefully inadequate for the richer semantics of a cloud model.
So, maybe the title of the original post should have been "The Cloud Doesn't Have A File System".
Thoughts?
-- Chuck
Posted by: Chuck Hollis | August 28, 2009 at 08:59 AM
@chuck
We'll just have to agree to disagree.
kostadis
Posted by: kostadis roussos | August 28, 2009 at 11:22 AM
@kostadis
Opinions and long-held views take a long time to change.
I just had an ugly flashback to my first "real" programming job (in high school!) where I was schlepping COBOL code at the time.
My wizened supervisor insisted that the optimal way to represent all forms of information was 80 characters of EBCDIC as it was the "most efficient".
I went a few rounds with him at the time, and we agreed to disagree -- as long as I did it his way.
Glad that things have changed!
-- Chuck
Posted by: Chuck Hollis | August 28, 2009 at 12:39 PM
@chuck
Hmm...
I am being compared to a dinosaur... Someone who is stuck in the mud ... Lacking any foresite ...
Hmm...
cheers,
kostadis
Posted by: kostadis roussos | August 28, 2009 at 04:33 PM
Kostadis -- no -- you're the guy who's thinking out-of-the-box, remember?
-- Chuck
Posted by: Chuck Hollis | August 28, 2009 at 04:38 PM
As the "competitive blogger over at NetApp" I can safely say; my vested interest is in working out what you mean. Not disagreeing with you, or agreeing with you, to suit some agenda I may have; just cutting through the medium and getting to the message.
I'm struck by the fact that you (and Paul Carpentier from what I can see) are interested in throwing away what you see as excess baggage from a hierarchical name & address space view of the world. Your assertion that
[quote]There's also the fact that information objects will usually want to move their physical location during their lifetime -- not an architectural problem for object-based information stores.
[/quote]
and your reply to Kostadis
[quote]
I disagree with your assertion that "ultimately any object service will need some kind of name lookup service, and that name lookup service to scale to billions of objects will rely on some kind of hierarchical scheme which will end up looking like a directory."
[/quote]
needs an example, because I'm struggling -- really struggling -- to come up with the counterexamples that you claim exist in the industry. Or how you might do this without the aid of DNS, a hierarchical namespace if there ever was one...
Glad, btw, that you eventually got round to mentioning REST, and that your conversion to the notion that real LUNs aren't necessarily better than fake ones. We'll make a real virtualised storage convert and WAFL fan out of you yet!
Posted by: Alex McDonald | August 31, 2009 at 06:29 AM
@Chuck:
'So, maybe the title of the original post should have been "The Cloud Doesn't Have A File System".'?
Well, maybe it should have been "The File System Doesn't Have a Clue"! as testified by untold junkyards of hierarchical pathnames pathetically (pun intended) trying to impersonate real metadata. ;-)
1. Hierarchical file systems will always get in the way of robust, automated, massively parallel scaling of storage
2. Hierarchical file systems are unable to provide the fundamentally unique references to immutable content or metadata capabilities required for long term storage
3. Hierarchical file systems are almost always more a hurdle than a help in modern application architecture based on RDBMSes where full path names need to be maintained rather than permanent unique identifiers that need to be stored once, at creation time.
4. Virtual, "projected" hierarchical views on top of object storage for legacy applications and some user populations may make sense in the short and medium term. Eventually, apps will go "object native" because it's simpler, more robust and much more scalable at lower cost, while end users are largely adopting objects (attachments) found through timeline and metadata (messages) search as their daily storage vehicle: the information contained in email systems like Outlook, GMail, Hotmail is already more important than the hierarchical folders on the desktop.
So yes Alex, you have read correctly that I am interested in throwing away excess baggage; hell, I just **love** to do exactly that. There simply is no role for infrastructure based on hierarchical file systems in the ubiquitous, always-on, connected world we're quickly becoming. They've made the lives of too many IT operatives miserable, they've overstayed their welcome, it's really time for them to go.
Ten years ago I mostly generated blank stares when preaching this. Now, I'm really glad that there are more people every single day that see the light and share this opinion. In the interest of a soundly competitive market, I sincerely hope that NetApp will join their ranks in time to make a difference.
Posted by: Paul Carpentier | August 31, 2009 at 07:11 PM