Over the last few centuries, much of mankind's ingenuity has been focused on eliminating the inconvenience of distance.
From sailing ships to modern air transportation; from hand-carried letters to global telepresence -- we spend a lot of time and money to overcome distance in the physical world.
And, in the next few years, there's going to be an intense focus in IT in doing exactly the same thing.
Why Is This Important?
I don't think most casual observers realize just how important distance is when considering IT at a global scale. The speed of light -- and its associated latency -- is not our friend.
Wikipedia tells us that photons and electrons cruise along at about ~300,000 kilometers per second. That sounds fast, but in the IT world, we care about milliseconds, so that's about ~300 kilometers per millisecond.
Most information transfer protocols require some sort of round-trip acknowledgment, and there's additonal processing along the way, so a handy rule-of-thumb I've heard is that you add about a millisecond of latency for every 100km of network length.
Doesn't sound like much, but it can add up.
You can see the effect if you've ever traveled to Singapore or Sydney, and tried to access a web application in North America or Europe. You may have plenty of bandwidth, but the latency can be downright annoying, especially for "chatty" applications.
Many of us believe that the only practical way to generically solve this problem is to figure out how to get the information closer to the user.
When considering IT infrastructure at global scale, the same problem arises.
If you own multiple data centers in multiple time zones, you'd like to be able to pool your resources if possible. It's one class of problem if everything is in the same room; it's another class of problem entirely when considering doing this on a global scale.
Distance can also be your friend when considering physical data centers. The farther apart they are, the more protected you are from various worst-case business continuity scenarios. For very large companies, there's comfort in knowing that you can potentially conduct global business operations from any continent if needed.
Indeed, in a distance-agnostic world, we think about high availability differently, business continuity differently, resource pooling differently, load balancing differently. We take what we've learned around DRS and vMotion clusters and start thinking really big.
Overcoming distance -- in a cost-effective yet performant manner -- dramatically affects how we think about how many data centers we'll need, how big they need to be, where they need to be located, etc.
On a global scale, this affects many billions of dollars of IT infrastructure investment.
Indeed, as we talk about private clouds -- a dynamic and pooled mix of virtualized resources, controlled by IT, we need to start thinking about serious distances, and how we'll overcome the challenges they present.
The Cisco Whitepaper
This post was largely brought on by an excellent piece of work led by Cisco -- and supported by VMware and EMC -- that might have gotten lost in the VMworld avalanche. There was also a popular joint session that Chad participated in at VMworld on this, link here.
Basically, the paper characterizes what happens as you "stretch" the various networks to moderate (i.e. <200Km) distances.
Even though you'll see EMC storage in the configuration, very little of our storage functionality was being used -- all the "heavy lifting" here was being done by Cisco's network -- a 622Mbps link stretched to 200km.
They talk about three approaches: (1) "shared storage", i.e. application moves, storage doesn't (2) "moved storage", i.e. storage moves first, then application is moved, and (3) "active-active" storage, i.e. a dynamic combination of both.
More on item #3 in a bit.
They do a good job of characterizing latencies and resulting application-level performance as the distance increases. They also give you a good sense of how long it takes to move rather large storage objects around using vSphere as the "mover".
The good news was that application degradation for "shared storage" was much less than I would have expected, and the time it took to move reasonable databases for "moved storage" was reasonable.
I would credit Cisco's networking prowess for this one, especially their IOA feature (I/O acceleration) which I think of as MPIO (PowerPath) for long-distance I/O pipes.
The sobering news was that the application workload was rather modest, the pipe was substantial and the distances only a small fraction of what we'll need going forward.
This is not meant to take away from the fine and very useful work done by all involved, it just shows the magnitude of the next mountain we're going to have to climb.
So where does that leave us?
Towards Information Logistics
The trick here is going to be all about pre-positioning the right amount of data in the right location at the right time -- and at the right cost.
If you've ever been involved with studying global spare parts logistical networks, it's roughly a similar problem in terms of complexity. Put too much stuff at the endpoints, it's expensive and out-of-date. Put too much stuff centrally, and service delivery experiences suffer -- not to mention spending a fortune on air transport.
Put the right stuff in the right place at the right time, and -- you win!! -- for the moment. It's never a static solution.
Stepping back into the world of IT architecture, I believe this "global information logistics" discussion will be very popular in a year or two. It's inevitable from where I sit.
You can see part of this thinking already in the marketplace from EMC today, e.g. EMC Atmos storage platform.
It uses a very flexible policy mechanism that allows relatively static information to be dynamically repositioned globally based on either
- explicit service delivery intent (e.g. "I want to ensure a good experience at all times" using multiple remote copies),
- rapid shifts in demand (e.g. "this particular piece of information is getting very popular" so make more remote copies),
- redundancy (e.g. "never want to lose this" so make redunant copies),
- or cost objectives (e.g. "no one cares much about this piece of information", so compress it and spin the drives down).
What the Atmos storage platform enables is simple: a straightforward expression of "information logistics policy" that balance service delivery, cost and resiliency.
Which leaves us with two open questions: how do we do this also for "hot" data (e.g. a database) and how do we do this without having to get too granular in setting explicit policies?
Serious room for innovation here, I'd offer.
Painting The Big Picture
As our businesses learned to operate globally, we learned how to harness the power of global workforces to create economic and competitive advantage.
We learned to put the right work in the right place at the same time -- and create a global pool of human talent to power our businesses.
Will IT be any different? For those of us who operate at a global scale, will we learn to harness the power of global IT resources to create competitive and economic advantage?
Will we learn to put the right work in the right place at the right time -- and create a "global data center" to power our businesses?
In both cases, we had to get very good at overcoming distance.
Chuck one clarification (just because already getting FUDed by FUDees).
The Cisco/VMware paper was authored based on solution work prior to Cisco Live this year. That used CX and V-Max, but no storage methods to accelerate the question of relocating the data between sites. Basically, a stretched LAN/SAN coupled with VMotion and the work focused on the VMware-level behavior.
At the time, key Cisco and VMware folks were already disclosed on our NDA technology which applies to this challenge. After Cisco LIve (where much of the question surrounded "well, what about the storage?"), we indicated that we felt that the time was right to start to share design goals, and solution integration testing data incorporating storage technology in the problem/solution with the world.
It was a 2 month period between Cisco Live and VMworld. We extended the same test harness and added "Storage Vmotion before Vmotion characterization", and then also our advanced active/active geographically dispersed technology.
This data was shared at VMworld 2009 in session TA3105, which certainly DID leverage "our storage functionality".
Now, there are several ways to do this today, but most are confined to relatively short distances (works in only some geos), require very high bandwidth/VERY low latency links (works for only some customers), and general involve "bounding" performance or lowering availability at each site (splitting an active/passive cluster). That's not to say they are bad - every technology has strengths and weaknesses.
If people look at TA3105, you can see what we think the storage requirements for this use case are (at least the customers we talk to).
Just want to draw a distinction between the Cisco/VMware whitepaper, and the three way solution data discussed at VMworld 2009 in TA3105 (shown here: http://virtualgeek.typepad.com/virtual_geek/2009/09/vmworld-2009-long-distance-vmotion-ta3105.html)
Thanks!
Posted by: Chad Sakac | September 24, 2009 at 06:38 PM
@chad -- thanks for the clarification.
You are quite correct, my comments regarding use of storage functionality applied only to the Cisco white paper, and not to your session at VMworld.
However, I don't think we're ready to completely spill the beans on the "advanced storage functionality" you refer to, even though we're all itching to do so.
Here's to hoping that FUDers learn to spend their time on making their products better and meeting customer needs, rather than just needlessly muddying the water.
Thanks again!
Posted by: Chuck Hollis | September 24, 2009 at 06:53 PM
Distance has been a real challenge for human beings ever since we occupied this planet. We have to be liberated from time to overcome the distance problem since time and distance are like two sides of the same coin. Speaking of coin-sides, the difference between liberty and tyranny is paper thin. Although we currently enjoy undivided liberty since the inception of this country, tyranny has always been but a hairsbreadth away.
I know tyranny intimately since I spent my early younger years in South Korea before I came to the States. If we allow ourselves to get complacent and ignorant at any time, the liberty of our seemingly invincible and indivisible Republic can quickly become but a distant memory in the face of a very real tyranny.
“Overcoming Distance” or “Private Computing” is great way to optimize the computing resources efficiently. Imagine you can utilize the IT resources from India, China, or wherever else in a moment’s notice and exchange the resources whenever you want. However, these great schemas have a fundamental and critical weak link: the geopolitical factor.
If we were prohibited from using the network due to any uncertain geopolitical influences, whether they are here in the States or overseas, the great cloud computing could become a true disaster. If you don’t make provision for this eventuality, the essential function of all of the great organizations that are heavily depending on the up-and-coming cloud computing on a global scale could be jeopardized overnight. The great “Overcoming Distance” organization that seemingly overcame the geographical distance somehow would become the “Unbecoming Distance” one in an instant.
Posted by: shiningarts | September 24, 2009 at 09:02 PM
Chuck, in line with "overcoming distance" we are interested in implementing something similar to this EMC publication:
http://www.emc.com/collateral/hardware/technical-documentation/h2981-emc-clariion-cx3-80metro-rcvry-sql-svr-ra.pdf
Now, the question I'd like to pose you.
That document reads very nice, but it has very little technical substance, in terms of "exactly how". Is it possible to talk to someone at EMC that has actually implemented this or is up to date on the technical minutiae?
We've tried the local EMC office but unfortunately all we got was more marketing-speak.
We want techno-speak.
Preferably from the folks who wrote the paper.
Can you lend a hand?
Thanks heaps in advance.
Posted by: Nuno Souto | September 25, 2009 at 12:34 AM
@nuno
Sorry for the frustration ... I've routed your interest to another group here at corporate who did the work. You should be hearing from someone before long.
If you don't, please drop me a line at hollis (underscore) chuck (at) emc (dot) com.
Thanks!
Posted by: Chuck Hollis | September 25, 2009 at 08:29 AM
Hi Shingarts,
그건 재미있는 역사이다. 나는 한국에서 그것을 이해하고 충분한 시간을 보냈습니다.
John F.
Posted by: John F. | October 15, 2009 at 01:33 AM