[note: this post has been updated with more accurate information here ...]
One of the most interesting storylines in the cloud debate has to do with Amazon.
While they were undoubtedly one of the first "name brand" public cloud services (and perhaps the best known), they continually struggle to adapt their offer to meet the needs of large enterprises.
Last Friday, Amazon held an invitation-only event to proclaim their "readiness" for large enterprises. The event was duly reported on by Carl Brooks for SearchCloudComputing.com
But -- to my eye -- the event had precisely the opposite effect: it amply demonstrated why public cloud operators like Amazon will never have more that a small sliver of enterprise business until they change their ways.
To Start With
I'd encourage you to go read the article first, since I'm going to spend the rest of this post commenting on what was said.
Most of the article (and Amazon's marketing to enterprises) center around Pfizer's use of AWS.
"Michael Miller, senior director at Pfizer Global Research and Development, said that his firm had embraced Amazon Web Services (AWS) only after the advent of Amazon's Virtual Private Cloud (VPC) service, which walls off resources from the public Internet. He said that it was a good choice for compute-heavy workloads, as long as it remained firmly under Pfizer's thumb."
My first observation is that Mr. Miller does not appear part of Pfizer's IT organization. I am reasonably sure that Pfizer's IT organization was not thrilled by the use of AWS. Indeed, it's quite likely that a group of R+D types went outside the organization and forced the situation.
Amazon's VPC offering is nothing unique: many service providers have offered the same "private network" capabilities for quite some time. It's nothing more than table stakes in any enterprise service provider discussion.
Also notice the "compute-heavy workload" characterization. In the landscape of enterprise IT, these tend to be few and far between. Indeed, most enterprise workloads are I/O and network heavy if anything.
The Story Continues
"Compliance is the bugbear. Does it actually allow us to deploy with the policies we've set?" he said. The answer was yes, but only in AWS VPC, and only after Amazon took pains to get a rigorous SAS 70 Type II certification and an extensive "cultural dialogue" with internal auditors.
"One of the things that was very useful was the SAS 70 report; we've also gone through several audits on our side and their side," Miller said. He added that auditors had to get used to virtualization technology before being able to successfully clear Pfizer's AWS operations."
Lots to comment on here.
If you're not familiar with SAS 70 Type II audits, I'd recommend you check here. Like most audits, it does not guarantee an environment as secure, compliant, etc. -- it simply states that the required processes and controls are in place.
While SAS 70 audits are a reasonable starting place, they create three problems. First, if you've ever read one, their interpretation requires a great deal of expertise. Second, they only refer to results over a specific period of time, since -- by definition -- they are historical documents. Third, any process and/or controls are only as good as the people who enforce them, and that tends to be where the problems occur.
Note the wry comment on "extensive cultural dialogue". I think Mr. Miller misses the a key point: most technology auditors I have met are comfortable with virtualization up to a point: intermixing of different workloads owned by a specific organization and/or tenant, e.g. Pfizer in this case.
What they're most definitely *not* comfortable with is intermixing of different workloads from multiple organizations in a single pooled infrastructure. Amazon's AWS model doesn't allow the segregation of workloads and infrastructure -- everything is one giant pool.
"He said Pfizer had built custom Amazon Machine Images, from the kernel up, so that it could verify integrity at every level; that kind of detail took the place of verifying hardware and physical integrity.
"We had to work with [auditors] to adjust their expectations…no, you don't get 'this chip, that box, that disk,'" he said. Instead, they get code review and kernel versions. Now that the heavy lifting is done, Miller guessed that Pfizer now consumes about 80% of CPU work on new projects in AWS, which makes up about 50% of total workload.
The hand-crafting of machine images is nothing unique to organizations who take IT compliance seriously -- like Pfizer. Every hunk of code is vetted and change-controlled. Deviations are detected and reported on. EMC, in fact, sells tools that assist with this IT compliance capability.
However, this not infrequent requirement had to be painfully retrofitted into Amazon's environment.
The comment on "this chip, that box" etc. again misses the point: had Amazon been able to go back to Pfizer and say "this is your pool of physical resources for you to virtualize, no one else will use them", I would bet that the discussion would have been a lot more productive.
The workload figures are potentially misleading: Mr. Miller is most likely commenting on his particular group's use of AWS and not all of Pfizer. I read this as "half of all workloads in this group are new projects; 80% of those go to AWS", which would mean that approximately 60% of all workloads in Pfizer's Global Research and Development Group are done using more traditional means.
The Pfizer director also said there were limitations to what his company would run on Amazon. Any older applications were tied to hardware, and sensitive data would probably never leave the firm's networks. Miller did say there were significant operational savings, something other panelists agreed with.
The claim on operational savings rings true -- but that benefit is inherent in *any* service provider model, and not necessarily unique to Amazon's offering.
What I Think Happened
A few disclaimers before we get started. I think Amazon offers a fine service for its intended use -- my comments are not meant to discredit Amazon's AWS, only its appropriateness for the vast majority of enterprise-class workloads.
Similarly, I believe that Pfizer runs a fine IT organization. But I think this one got away from them.
Pfizer's primary business is developing drugs. One would think that Pfizer's global R+D function is quite privileged in many regards -- it's their work that makes money for the company.
I believe that -- many years back -- their needs for dynamic and on-demand compute/storage services weren't being met by the traditional IT organization. This is not a slam on Pfizer's IT organization -- the same situation generally exists in many organizations that have a pronounced R+D function.
I think the R+D group found AWS and ended up embracing its model and toolset. It was exceptionally easy to consume -- anyone could open an account. Applications got written that produced results. No need to go to IT budgeting and planning meetings -- just get on with it!
Usage of the service grew over time.
At the same time, I would be surprised if vocal concerns weren't being raised by other corporate functions: the compliance and security guys, the auditors, and mainstream IT. Their concerns, while probably valid, probably went unheeded by the researchers who now had "their" cloud to go use.
Time went on. Usage continued.
And eventually Pfizer was faced with a hard choice: either invest in making AWS compliant enough for the intended usage, or pull all those applications back from AWS and convert them to something else that was more controlled.
I'm sure this boiled down to two numbers: plan A and plan B. The implied level of effort stated in this article (just think about how long and hard this effort must have been for all involved) gives much credence to the "stickiness" of applications developed using the AWS toolset, and the difficulty associated with moving them to other environments.
Simply put, I think Pfizer got pregnant.
What Should Have Happened
I believe that Pfizer's R+D group has enough scale to matter: they could consume their own cloud, or at least justify something special be built to their liking.
Were we doing things over, I'd recommend that they build their environment in virtual machines (preferably VMware), using tools of their choosing that were associated with the VM, rather than a public cloud operator's unique service.
Now they'd have some interesting options.
First, they'd be able to stand up a critical mass of internal infrastructure behind the firewall -- should they choose to. I am no expert, but I would bet that there would be less extended grinding of gears with their compliance teams using this scenario.
Second, they'd be able to go to a compatible service provider and use a hosting or co-lo model to run their environment -- with no intermixing between Pfizer's workloads and other users. If building data centers or carrying assets on the balance sheet were a concern, I'm sure there are dozens of great SPs who would love to help them out with this.
Third, they'd get some options as to what to run internally or externally. It'd be basically the same application environment; the only decision would be where to run it. Compare this with Pfizer's AWS implementation: there's one and only one place that workload can run, and that's in an Amazon data center.
And no IT professional wants to be put in that sort of position.
What Does This Mean?
I see the Pfizer / Amazon story as essentially a one-off: very unlikely to be repeated by Amazon.
Far better options are available in the market today: private cloud architectures that are very easy to deploy internally, as well as many dozens of proficient service providers who can use the exact same technology to create a compatible and controlled environment that will undoubtedly be far easier to pass compliance concerns.
Not only that, the ability of SPs to combine more traditional hosted and co-lo approaches with pooled virtualization is a big win: the economics of a cloud model without the compliance headaches.
I think this sort of story also will result in a new term being popularized: "cloud portability" -- the ability for a tenant of any external service to pack up and move to a new location without spending tens of millions of dollars for the privilege of doing so.
And finally, I can't imagine a more cautionary tale for any IT organization that falls behind the needs of its most valuable knowledge workers.
Sorry, one more thought.
The article also spends some time discussing how Newsweek is using AWS.
Understandably, any external service that uses a pure opex model might be appealing to them these days: http://www.dailyfinance.com/story/media/newsweek-is-for-sale-but-buyer-will-be-hard-to-find/19465620/
-- Chuck
Posted by: Chuck Hollis | 06/28/2010 at 10:14 AM
wholly agree, that most likely they're R&D group went outside & forced Pfizer's IT dept to get inline. I've dealt with that in the past; the 1 benefit (especially for non production services) is the shift of responsibilities. If the outside service goes down (& is not network/connection related) than it's no longer ITs concern to get it up & running.
Though I understand the "this box, this chip, this storage" issue, & do have issue with it for certain apps. Shouldn't a Virtual based solution, monitor & migrate Guests to faster/slower CPUs based on their historical performance??
Posted by: P. Clarke Thomas | 06/29/2010 at 12:18 PM
Hi
With a private cloud model, it's all about the tenants being in control -- hence the enormous struggles to find a fit here between Amazon and Pfizer.
Given that thought, the enterprise IT group ought to have a say in how their "bucket of minutes" are used by an external service by setting appropriate policies -- one of which might be "migrate workloads with historical low perf requirements to a more cost-effective deployment" among other potentials.
The key point here? The tenant should decide the policy (and pay for it!), and not the service provider.
Thanks for the comment ...
-- Chuck
Posted by: Chuck Hollis | 06/29/2010 at 12:56 PM