Many amazing things being done in our EMC IT group these days.
The original premise was that -- once we had made enough progress on transitioning to an ITaaS model -- the value-added strategic IT initiatives would start to come hard and fast. Well, they are.
Almost everything IT does now is based on variable, re-usable (and competitive!) services: IaaS, PaaS, SaaS etc. IT friction is dramatically reduced as a result.
Business users are now marketed to, and not simply told what they can and can't have.
A financial governance model based on price signals and an internal marketplace is now starting to optimize IT decisions.
It's a very different IT game here at EMC than just a few short quarters ago. Perceptions have changed; enthusiasm is everywhere. And I get the privilege to share it with all of you ...
In a previous post (The Journey To Big Data Analytics), I outlined the process by which most companies we thought would likely acquire and leverage new analytical capabilities. At a high level, we now feel prepared to start our own internal journey in earnest, much like we're simultaneously tackling the challenge of mobilizing our extended enterprise.
In this post, I'd like to begin to share with you how we're organizing for success around this particular topic. Yes, we're using plenty of EMC technology (naturally!), but the real key to anything transformative is the organizational approach.
The Baseline
If you were creating a Gaussian distribution to measure the overall analytical proficiency of EMC internally, you'd find us better than many, but nowhere near best-in-class.
Yes, there's widespread use of operational reporting off of various aggregated data sources, like most companies.
More recently, we've seen a few hot spots in our business where we've applied data science and predictive analytical models, and have now seen the blinding potential.
As a result, we're now hooked.
We want to do more across our business, and do it fast.
For example, one of our in-house data scientists re-examined our disk drive reliability data sets in her spare time, and came up with a predictive model that's demonstrably better that what we were using. If you understand our storage business, you'll quickly realize that being able to predict drive reliability means a lot not only to us, but our customers as well.
We're not quite sure yet what the eventual impact might be, but this one exercise alone easily could exceed hundreds of millions of dollars, not to mention improving our customer experiences.
A key point that should not be lost on anyone: the data scientist in question knew absolutely nothing about disk drives, how we used them, why they were important, how they were designed, etc. -- a complete "domain novice" that just went looking for interesting patterns. She quickly came up with a better predictive model than the hundreds of very bright and expert people who have studied this particular topic for well over a decade.
Serious food for thought -- and sort of humbling for those of us who consider ourselves "experts".
Another example: one of our brighter IT architects started to mash up IT log data against what the more structured tools could do, and came up with a half-dozen amazing insights around predicting capacity surges, key bits of infrastructure and process that were likely to have problems in the future, and that's *before* we start talking about security logs.
OK, he had some good domain knowledge, but ended up just experimenting with stuff, looking for interesting patterns. He found a whole bunch, and it didn't take long.
Another data science exercise was looking at our TCE (total customer experience) data sets, and coming up with useful demographics of customers who might *eventually* be dissatisfied with EMC for one reason or another -- perhaps even before the dissatisfaction started.
Think about that one -- a new predictive capability that allows us to start to address root causes *before* anyone actually says they're unhappy.
There's more, but it's not hard to make an immediate and compelling case that this data science stuff is going to transform the way EMC does business for the better. And, of course, we want to get there sooner than later.
At the center this storm is Sean Brown, our Director of Enterprise BI.
Sean is very passionate about the new use cases and their amazing potential, and you can see him get rather excited when he starts to talk about this stuff -- justifiably so. Although he's been at EMC for over four years, he comes to us from Alcoa where he was instrumental in setting up their BI capabilities.
Now he's working for a technology vendor (EMC) that's making massive investments in big data analytics, which makes things even more interesting.
Goals And Motivations
Sean can quickly articulate a number of goals and outcomes the team is looking for.
First and foremost -- the big goal is to systematically increase the general level of big data analytical proficiency across the entire organization.
This in, turn, means (a) reducing most of friction associated with obtaining data sets, tools and resources, (b) investing in education and training for users, (c) seeding the team with a few hard-core data scientist experts, (d) creating collaborative communities of like-minded people doing similar work, and (e) providing lightweight governance during a rather fundamental transition.
If you were asked to do a BIaaS project, you'd probably articulate (a) and perhaps fail to consider (b), (c), (d) and (e). This expanded view includes much, much more -- as we'll see here in a moment.
If one theme is around "maximizing good", there's a less dominant theme around "minimizing bad". We do a lot of "desktop data warehousing" around the company. It has its limitations from a user perspective, but if you simply look at it from a cost, lost productivity, GRC, etc. across thousands of users -- well, there's a lot of poor practices we can collectively eliminate.
I don't even want to think about how many people might have exported data sets outside the firewall surreptitiously to get something done :(
Exposing Data Sets
In terms of simple enablement, the team is using elements of the new Greenplum UAP to do a number of very cool things.
One theme is to surface as many data sets as humanly possible.
Data sets are largely provided "as is" -- there's no real attempt to sanitize, transform or otherwise cleanse the data prior to exposing it. Should someone have a need for hygenic, audited and verifiable data -- well, that's thought of as a special case vs. the generic case.
All data is presumed to be shareable to any EMC employee, unless there's a clear business reason why access should be restricted. The "DEFAULT == SHARE" is not only far easier to implent, but greatly facilitates data experimentation with diverse data sets, which -- after all -- is the goal here.
Data can either be imported into the environment, or exposed where it currently rests on another operational system. Data sets that are imported are obviously easier to work with; data that must be sourced from an external system takes a bit longer to get.
As data sets become more popular to the user community, provisions are made to make them natively available; less popular data sets are demoted to "virtual" status.
Users decide what's important; not IT or someother governing body.
Exposing Data Policies
There's a serious and valid concern about either surfacing data inappropriately, or someone doing something ill-advised with a particular exposed data set.
The team has come up with a realistic approach: the platform surfaces data policy and recommendations -- even if the data set itself is not widely exposed. All data set consumers get clear communication around how the data set was generated, examples of approrpriate and perhaps inappropriate uses, and who to contact if the data set has restricted access.
The BIaaS team is not responsible for defining and enforcing policies -- that's done elsewhere in the business -- they're just responsible to make sure that everyone knows the rules of the road. And I'm sure there are going to be a few inevitable bumps in the road as there always are.
I think we're collectively OK with that :)
Exposing Toolsets
One thing I've come to appreciate around data science is that there's no "best tool".
Skillful practitioners use a variety of tools, and often get comfortable at building their own. The approach is to surface the tools people are using today -- along with what they're good for, and how people are using them -- but clearly acknolwedge that we'll probably have many, many more in our future.
For the first phase, the toolset are the usual licensed software candidates (e.g. SAS, SAP etc.) but also all the open-sourced tools you find along with the Hadoop distribution (e.g. R)
Exposing Expertise And Insight
Our friends at Greenplum have been advocating the need for collaboration and community amongst practitioners, and (corporately) we get it. Fortunately, we're pretty comfortable with the power of social collaboration principles here at EMC (take a bow, social team!), so it's not much of a conceptual or cultural leap for us to understand not only why it's important, but how to go about making it real.
From a platform perspective, once again we're fortunate in that we're using Greenplum Chorus (as a part of UAP) to quickly stand up fluid communities that provide full transparency around data sets, interesting outcomes, useful expertise and more.
In this environment, expertise and insight flows horizontally vs. up-and-down the org chart.
Driving Consumption
One of the more controversial tenants of anything-as-a-service is that you're chartered with driving productive consumption, and not burdened with rationing usage. If your service doesn't get used to deliver measurable business value, you aren't successful. Period.
The BIaaS team has a few "anchor tenants" they're working with. One is our corporate quality group. Fortunately, this team has been doing analytical work for many, many years. They've also got a plethora of rich historical data sets to "bench test" predictive analytical models. Needless to say, corporate quality is something that "moves the needle" for EMC.
A second anchor tenant is our sales operations group. We care a great deal about sales productivity and efficiency at EMC. The goal here is simple: to move from hindsight to foresight. Making any changes whatsoever in a sales operation model is a really big deal fraught with justifiable concerns, especially at scale.
Being able to model likely outcomes of proposed changes (based on past experiences) means that work towards optimization can accelerate -- there's the potential for much less friction associated with considering operational changes.
A third tenant is our outbound marketing organization. We have marketing data, we have revenue data, we have customer satisfaction data -- what opportunities might there be to align all three?
And the potential for much, much more.
Governance During Transitions
If you think about, we're fundamentally changing the way we use data across the business. Big potential, but also not without potential risks.
As with past transformations, we've gotten very comfortable with the idea of establishing executive-level governance teams to provide high-level guidance (and maybe a bit of oversight!) as we learn new ways of doing things.
To be clear, these aren't project review meetings, they're to surface the experiences so far, and attempt to head off any potential policy concerns without slowing the pace of progress.
I had a similar experience many years ago when I was leading the charge for social media proficiency here at EMC. We formed an exec governance board along these lines, and it proved to be very useful in a surprising way: there was far more confidence in doing things in new ways simply because the governance function existed.
More confidence; faster results.
Where Are We Now?
The environment is now running in what I'd described as an "advanced beta" state: groups are starting to do useful work on the BIaaS platform, and the supporting team is gaining valuable experience around how their internal "customers" are using the tools.
There will probably be a second round of anchor tenants, which will then be followed by a open-to-all announcement before too much time has passed.
I personally think that there is a notable division in user requirements starting to emerge, though.
On one hand, we've got a large population of small-scale experimenters emerging. On the other hand, we now have a handful of power-user / data scientists whose needs (and potential value generated) greatly exceed that of the first population. We might need to have to have separate approaches for each audience over time.
The other wrinkle on the horizon is the best approach for sourcing external data sets. Internally, you've got the familiar challenge of who pays for something everyone can freely use. Externally, many of the more interesting data sets have rather restrictive licensing terms. And I bet we're wrestling with this one before the year is out :)
Lessons Learned?
At one level, success is all about asking the right questions -- e.g. what are we really trying to achieve here?
When I was involved in social media, the "right question" was "how do we create widespread expertise in social proficiency that could change how we did business across the enterprise?"
And not just a better way of spamming people with advertising :)
When the EMC IT team started down the road to ITaaS, it was about optimizing IT service production for business consumption: learning to model IT after what competitive service providers do, and -- ultimately -- increase the consumption of IT for the benefit for the business.
More recently, the EMC IT mobility team focused on increasing productivity by delivering a superior user experience: for our employees, partner and our customers.
I suppose the success of any "answer" depends largely on the question :)
Chuck,
Predictive analytics is starting to remind me of the pre-crime capability imagined in the movie "Minority Report". If only they new then how close they were to a compelling service :)
Scott
Posted by: Scott Harden | January 25, 2012 at 06:53 PM