A long while back, I wrote a speculative post around what I dubbed "information logistics" -- the disciplines associated with getting the right information in the right place at the right time -- and at the right cost.
Our current global economy runs on a bedrock of well-executed logistics: food, products, energy, travel, defense, etc. It's not that far a stretch to assume that in our nascent digital economy, the same disciplines will be required.
Now, two years later, I think the case is far more compelling …
Logistics In The Physical World
In most developed economies, you simply get used to having what you want readily available. Just walk into any supermarket or retail store. Or jump online to get what you want.
Behind our simple consumption model, there's a fascinating network of supply chains all managed by logistics. It works so well, sometimes we forget just how complicated and sophisticated it really is.
Pick up that smartphone, and think for a moment about where all of its components and raw materials might have come from around the globe.
I had a very brief fling with logistics many years ago, and came away realizing it's an interesting problem in optimizing against multiple competing constraints.
And I think we're going to reinvent it before long in the digital domain.
A Simple Thought Example?
Let's say you were in charge of running a global supply chain. You can't make any money unless you have product to sell, so you certainly don't want to run out of product.
Your first thought may be to create warehouses in various locations. But that comes with several costs. There's the cost of the warehouses, the cost of the inventory, labor -- and the fact that inventory gets stale and often needs to be updated.
You might instead go the other direction -- fast transport. Book capacity on fast jets, and minimize your inventory requirements. But using jets is expensive as compared to other transportation modes, and there's still some latency involved. Not to mention, you're now entirely dependent on your transport provider.
Making matters worse, there are several factors you can't control Demand for your product, for example, which is hard to predict. Disruption with your suppliers, or perhaps a regional event like extreme weather.
While you're thinking about protecting yourself against these risks, those protections come with a cost.
The good news is that -- in the physical world -- the discipline of logistics is well-understood. Maybe some of the inputs change, but the process of finding an optimization point is familiar.
Now Let's Move To The Digital Domain
Now, instead of physical stuff, let's imagine information -- an exponentially growing stash of of 1s and 0s.
As before, we can imagine multiple producers and consumers in the supply chain, all scattered around the globe. And none of them want to wait for their information.
The farther the consumer is from the producer, latency rears its ugly head -- not to mention bandwidth costs. Applications want to be close to their data.
Your first instinct is to create warehouses -- storage, in this model -- and use those distributed repositories to improve access. But those repositories come at a cost, and -- besides -- the "inventory" of information is being continually updated.
Perhaps your next instinct is to instead invest in really fast networks. Sure, you can move a lot of data, but the speed of light is a constant. Not to mention, those big pipes can get expensive -- and they occasionally have a bad day.
Risk mitigation plays a factor here as well: the more redundancy, the more protected you are -- but that comes at a non-trivial cost.
This Is Not A New Problem, Or Is It?
It's not hard to make an inventory of different approaches to solve these challenges: from brute-force replication to sophisticated caching a-la-CDN to simply doing nothing. But what worked in the past may not work in the future.
For starters, we're starting to use information differently. Yes, we're all addicted to rich content on our mobile devices, but there's more over the horizon: a world of big data and the internet of things.
Our initial instincts -- bring everything back to a centralized location -- is at at odds with the highly distributed nature of where information is being generated, and how it will be consumed.
One school of thought is to embed the required logistics logic in the application itself. While this definitely punts the problem out of the infrastructure domain, it leads to very lengthy application development cycles -- after all, modern app development is about consuming services, and not reinventing them over and over.
Making this approach even less desirable is that the traditional 1:1 relationship between Application A and Information A is quickly becoming deprecated. The new pattern is multiple producers and consumers against a shared information base, meaning we'll need underlying services that do the heavy lifting.
It then starts to get interesting. For example, it's safe to assume that all producers and consumers of a shared data pool may not want to see their data in the exact same format. Application A wants to record an event, Application B wants to do analytics against it.
Outside the technology domain, it's easy to see that governments around the world are starting to take a stronger interest in information flows across their borders -- just like they have an interest in the transport of physical goods.
We Need A (Storage) Service
Like any important IT service, you'd ideally implement it as far down in the stack as possible. In this case, since we're talking about information, the storage layer becomes a compelling candidate.
This is not a new idea -- while object storage has been around for a while, a few implementations embody notions of "information logistics" via policy expressed through metadata, e.g. this object should be immediately replicated to N locations, this other object should be dispersed in shards to improve resiliency, this other object really isn't that important, this object shouldn't cross a certain geographical border, and so on.
If you're at all familiar with EMC's Atmos, you'll recognize the construct -- it's been around for awhile. But while Atmos has been successful enough, surprisingly few people take advantage of the very powerful logistics mechanisms that can be enabled programmatically.
But, even if object-style storage might be tempting, there are pragmatic concerns at hand: not every data presentation type is amenable to having object-style metadata associated with it. LUNs, for example. Files and filesystems. Transactional updates. HDFS. Graphs and key-value pairs. The world doesn't neatly fit into today's object paradigm.
One reasonable prediction is that we'll find far more object concepts finding their way into ostensibly non-object data stores -- or, at least, the notion of embedded actionable metadata.
Speed Is The Need
The cost points around how we think about these things is changing as well. Networks get cheaper and offer more bandwidth, but memory is getting cheaper as well.
As memory prices fall, smart caching strategies become more interesting -- if you've got the algorithms to pull it off.
As just one example from EMC's portfolio, VPLEX does a great job of dissolving distance for hot block data through the use of fiendishly clever caching algorithms -- and healthy gobs of memory and CPU to support it. The first time you see it work, it's like a magic trick you can't explain.
But current VPLEX semantics only support block data types -- and we'll inevitably need a more generic cache-federation-at-distance solutions with richer semantics -- although it's a fascinating starting point.
Plenty Of Room For Innovation
As the world changes, new solutions are demanded, which spurs innovation.
Our world is changing: much more information, many more geographically dispersed producers and consumers, and persistent intolerance for latency and unavailability.
Oh, yeah, and cost matters too.
An important part of the software-defined storage discussion (in all its noisy forms) is a framework for the creation of new data services -- and, before long, I would expect strong demand for information logistics services -- simple mechanisms to express policy, engines that move and cache, and frameworks that report back on the efficacy.
The New Logistics Manager
Buried inside any global enterprise that deals in the physical world, you'll find a sharp logistics team, armed with the best tools, watching the flow of goods minute-to-minute around the globe.
They're a key part of what makes the physical economy work so well.
In the new information economy, can we be so far away from that in the digital world?
----------
Like this post? Why not subscribe via email?
Interesting view. I would suggest to add the issue of digital e-waste, e.g. bits and bytes nobody wants anymore as well. We can apply waste management policy of the physical world as well
Posted by: Nick Bakker | June 29, 2013 at 09:03 AM