One of the more thought-provoking ideas from EMC World 2012 was embedded in Pat Gelsinger's keynote.
Once you got past the extreme (and enjoyable) production values -- and 42 new products -- I found myself thinking hard about what he had to say.
He presented a case for a fundamental shift in the "physics" of information technology.
Data has mass; big data has big mass -- and the resultant gravitational forces were causing a re-thinking from the familiar application-centric world to a data-centric one.
And the more I thought about it, I started to realize some of the very cool implications of Pat's observation.
To Begin With
To be absolutely precise, information (1s and 0s) don't really have mass in the traditional sense, but information storage does require energy to set state and resist entropy, so I guess I could claim that mass and energy are two forms of the same phenomenon, thanks to Einstein.
But Pat's claim isn't meant to be literal, it's intended to be metaphorical, so let's evaluate it on that basis.
Gravity is caused by mass, curving space in such a way that an attractive force is exerted: proportional to the masses involved and inversely proportional to the square of the distance.
Here on Earth, we tend not to pay much attention to gravity, but -- once you leave our planet -- gravity becomes much more interesting. Interplanetary navigation, for example, is fundamentally all about overcoming, avoiding and leveraging large gravity wells.
Extreme mass causes all sorts of interesting galactic curiosities as neutron stars, quasars, black holes, gravitational waves and -- of course -- great attractors.
At a cosmological scale, gravity is a force that shapes the structure of the universe.
Not to mention, current gravity theory appears to be incompatible with what we know about the rest of the universe via quantum mechanics.
Back to information: as a society, we've now started to amass mind-bending amounts of data for this first time in our history.
And -- as a result I think we're starting to see data gravity work in new and interesting ways.
Data Gravity And Viscosity
Pat used the term "viscosity" to describe gravitational effects. I tend to think of the effect in terms of gravity wells and needing to overcome physical forces.
A simple example would be a data migration from an old array to a new array. As we move from gigabytes to terabytes to petabytes (and beyond), we end up with ridiculously long times and amounts of effort to simply move a pile of data from one physical location to another.
Yes, networks are getting faster, but data growth appears to be seriously outstripping bandwidth growth.
Using this 10 gigabit ethernet example, an exabyte (1000 petabytes) would take 10,000 days, or about 27 years to move. And there are more than a few members of the "Exabyte Club"
today.
It's not hard to visualize a deep, strong gravity well forming around all that data; one that requires enormous effort to overcome.
The more data, the stronger the forces at work around it.
Data Gravity Attracts Money ... And Talent
Pat shared some quick examples of newer, information-based businesses that have amassed stupefying amounts of data in the course of their activities.
These are just the tip of the iceberg - there are many hundreds to examine if you're interested.
Some are relatively "pure plays", other are information businesses embedded in more traditional ones.
We, as investors, can only value these newer enterprises in terms of traditional measures like revenue, profit and margin. We have no tools or frameworks to assess the value of the mountains of information they're sitting on.
It's almost like they've acquired mineral exploration rights to a million square miles of unexplored territory -- there's no idea of what might be lying underneath the surface.
I mean, how do you go about valuing 100 petabytes of social data, or 50 petabytes of health care claims, or 200 petabytes of consumer retail behavior data?
These very large "gravitational wells" are attracting some of the brightest people you'll ever meet. They're drawn into a world of massive, diverse and uncorrelated data sets. They want to explore and innovate in a way that human beings haven't been able to in the past.
My kids are college aged now, and I end up spending a lot of time looking at universities. Many of them promote the size of their library (e.g. 17 million volumes) as a reason to attend their institution. Personally, I can't remember the last time I went to a library to get a book.
How long will it be before universities advertise the size and breadth of their mashable data sets, ready for exploration by bright researchers? And how long will it take for us to develop the frameworks to measure the economic value of large information bases: both individually and aggregated with others?
Data Gravity Inverts The Relationship Between Applications And Information
Closer to home, there's a clear argument to be made on the growing importance of large, diverse information bases -- and less importance being placed on the applications that access them.
For an interesting personal take on this, see "Why Applications Are Like Fish, And Data Is Like Wine".
The move to SOA -- service oriented architectures and its web-oriented derivatives -- was likely the first massive wave of decoupling of applications from information. The advent of virtualization further separated applications and information.
Consider the advent of big data analytics, and there's a complete and intentional decoupling; indeed, there's an argument that can be made that value of information tends to increase when it's considered outside the context it was originally created.
From my perspective, older themes become relevant once again in this new light.
Should information be on the balance sheet? Who looks after extracting the value from the aggregated information portfolio, much in the way that a CFO looks after other more familiar asset portfolios?
From a technologist's perspective, what will be the new approaches to managing massive amounts of potentially valuable information? How do we move it efficiently from where it resides to where it's needed?
From an IT perspective, it's almost like there's a new physical force to be dealt with -- the gravity of data.
And I think that was precisely Pat's point.
After reading Dave McCrory's original posts on Data Gravity:
http://blog.mccrory.me/2010/12/07/data-gravity-in-the-clouds/
http://blog.mccrory.me/2011/04/02/defying-data-gravity/
I wrote about the concept of Planet Data a few months ago http://www.cloudsofchange.com/2012/02/welcome-to-planet-data.html
and we spoke with Dave about the evolution of Data Gravity thinking on the podcast last week: http://www.thecloudcast.net/2012/05/cloudcast-eps38-project-razor-google.html
Once we understand the impact (challenges of data movement), we can also begin to understand the opportunities unlocked by beginning to create APIs around large amount of valuable data that could unlock new potential for customer, partners, researchers - http://www.thecloudcast.net/2012/02/cloudcast-eps32-apis-new-language-of.html
Posted by: Brian Gracely | May 31, 2012 at 08:30 AM
Hey Chuck! With respect to the library, I think you are hitting on something that drives me nuts about educational institutions--they are terrible at marketing! And to think, all they would have to do is throw an "e" in front of that Library name and it would make sense.
The Library, in its traditional sense, is where my son gets books for his weekly reading and where we take the kids to get books during the summer.
In my case, my university library is both a Single Sign-On (SSO) login that gets me electronic access to those same gravity wells you describe and a collection of links to external data sources for use in research. Albeit, most of the linked data sources are public, which I think is more to your point. When will valuable data sets be available for better analysis?
Posted by: Branden Williams | May 31, 2012 at 09:28 AM
Chuck,
There is an interesting post by Nicolas Carr on the noise vs signal aspects of data.
http://www.roughtype.com/archives/2012/05/a_little_more_s.php
The observation is more frequent measurements increase the noise level, not the signal. If data sets are big because they have more frequent measurements, you may find it a lot harder to understand the data than if the data samples are taken less frequently. Or, to continue with the "gravity" metaphor, if you collect too much data, your get so much gravity, you end up with a black hole from which nothing escapes, including information.
In the end, businesses need to keep cost in mind. The fact you can sample data every second may increase the cost of actually understanding the data.
Interesting ...
Posted by: BRCDbreams | May 31, 2012 at 10:25 AM
Hi
As I work with data science professionals and similar researchers, they seem to be much more interested in correlating disparate data sources vs. simply more data from the same source.
I suspect that Nick's observations wouldn't apply if that's the case.
-- Chuck
Posted by: Chuck Hollis | May 31, 2012 at 11:42 AM
Jer Thorpe gave a TEDx talk recently about the weight of data. he comes at it from a visualizer's perspective - how to get a different kind of worth from the information. worth a few minutes to check out: http://www.youtube.com/watch?v=Q9wcvFkWpsM
Posted by: Scott Lee | May 31, 2012 at 11:58 AM
Scott
Fascinating story. Well worth the time to watch. Thanks for sharing!
-- Chuck
Posted by: Chuck Hollis | May 31, 2012 at 12:37 PM
毎年、このシーズンは旅行に最適な時期です。ティンバーランド スニーカー、はあなたたちの屋外スポーツの欲求と、“軽装に行く”と“ドライ付随する”シリーズの機能性とファションの仕立てミックスの追求を満たすように設計しています。
Posted by: ティンバーランド ブーツ | June 18, 2012 at 08:12 PM