Today's twitter stream is chock-full of commentary from GigaOM's Big Data conference in NYC. Add that to the previous Strata conference, and it's easy to see that interest in Big Data is rising, and sharply.
Outside of those industry events, not only is the pace of conversations picking up rapidly with customers and partners, the industry press is starting to show some interest as well.
I thought I'd share with you an email interview I just did on the topic. Maybe it'll get published, maybe not -- but I thought a few people might get some value out of it.
What's this Big Data thing all about?
You'll hear many, many opinions, but I think it's mostly about the growing recognition that there are all sorts of new value-generation opportunities that can arise from assembling unprecedented amounts of information.
It's a sharp departure in mindset from traditional IT applications. And, wherever we look, we're finding more and more examples across just about any industry you'd care to mention.
For me, it's clear sign that -- in the information economy -- big data will likely be the platform that supports that new business initiative, that new competitive advantage, that new source of revenue, that new way of doing research, and so forth.
It's quickly becoming "the new thing".
Ask any business or organizational leader "what could you do with massive amounts of information?" and you'll likely hear a long list of things they'd like to go do -- if they could!
So isn't this simply a reaction to all of us creating massive amounts of information?
Yes and no.
It's true, we as a society are creating unprecedented amounts of information, and it shows no signs of slowing down anytime soon. But the "big data" perspective sees this as an opportunity, rather than a problem to be addressed.
There's value in all that information -- if we look at it the right way. And the case for a big data approach will only increase over time.
Haven't we had this discussion before? E.g. data warehousing, content repositories, databases, big file systems, etc.?
Sure, people have been using a variety of means to harness value for information for as long as there have been computers. So it's a fair question -- what's different now?
I see it as sort of a "perfect storm".
One important wave is the stupendously declining costs for infrastructure: compute, network and of course storage. An order-of-magnitude drop in costs means that there is a corresponding order-of-magnitude new set of opportunities for big data applications that weren't economically feasible before, and now are.
And let's not forget the cloud and virtualization -- these new forms of cost-effective infrastructure are far easier to operate -- and consume! -- than they were even a few short years ago. In one sense, the move to cloud (whether private, public or hybrid) sets the stage for big data.
The second important wave is that there's just so much exponentially more raw information available to gather and harvest. Obviosuly, the vast majority of information is now "born digital" vs. created on paper. Add in social feeds, clickstreams, sensors and cameras everywhere, email traffic, etc. and there are now unimaginable vast rivers of information available to harvest, with much more coming every day.
It's a little staggering to consider it all.
And I think the third wave is the new generation of enlightened entrenpeneurs and leaders who recognize these fundamental shifts, and are highly motivated to capitalize on the two trends above. There's enormous value in information; so it follows that exponentially more information can result in exponentially more value.
None of these trends show any sign of slowing down, either.
Isn't big data all about analytics at scale? Shouldn't we be calling this "Big Analytics"?
Sure, that's an important theme, but there's certainly more to the picture.
Not all big data applications involve analytical analysis of structured data. Simply capturing and sharing large amounts of unstructured content that can be used in a variety of ways also qualifies as "big data", since there are a lot of parallels.
Indeed, some of the most fascinating applications I've seen are a combination -- large amounts of unstructured content, heavy metadata and powerful analytics that extract value from both.
Is the technology used in big data environments fundamentally different that traditionally used in enterprise IT environments?
Yes, I think that's an important point. The technology approaches we see used in these environments is reasonably differentiated today, and I fully expect it to get more differentiated over time.
Take something as fundamental as the notion of scaling. It means one thing in a traditional enterprise IT environment, it means something completely different in the world of big data. A nominal big data environment will end up storing many times as much information as you'd find in your average enterprise.
That quickly gets you into a discussion of scale-out, shared-nothing architectures with massive parallelism -- very different than what you typically find in a traditional IT environment.
Not surprisingly, administration and management has to be orders-of-magnitude easier and simpler -- simply because you've got orders-of-magnitude more information to deal with.
As an example, if you look inside something like EMC's Isilon or Atmos product, or even the Greenplum data analytics platform, you'll quickly see that they're built very differently than traditional storage or data warehouses.
Could you do big data applications with architectures designed for a traditional IT environment? I suppose you could. But purpose-built architectures are already showing clear and meaningful advantages -- especially at scale, which is what we're talking about -- and I'd expect that trend to increase over time.
So, if the technology is different, is the IT function different as well?
Yes, and that's perhaps even a more important point when you think about it.
When we work with a big data-oriented IT team, they're usually tightly aligned with the people who are using the environment to do their work. It's not like the users are way over in one part of the org chart, and the IT folks are way over in another part of the org chart.
They're not only part of the same team, very often they're co-located as well. The mission of the IT team (in addition to keeping things running!) is to empower the knowledge workers and data scientists who are chartered with extracting value from all that information.
You'll see all sorts of cool things like self-service portals, just-in-time provisioning, and so forth. There's a strong priority on agility and responsiveness. It's far more cloud-like, and very much unlike traditional IT orientations.
In a nutshell, in a big data environment, IT is there to enable the core mission. You'll sit in a meeting, and it's often hard to tell who are the IT people, and who are the business users!
The best part is that they're usually having a great deal of fun doing some very cool things.
There's got to be more to big data than just infrastructure, right?
Yes. Let me attempt to draw a sort of imaginary stack for you.
At the bottom, there's infrastructure, of course -- storage, compute, network -- all very virtualized, and typically built around scale-out principles. That, of course, includes operational processes that make it exceptionally easy to manage and deliver services in a cloud-like fashion.
On top of that, there is usually one or more software platforms that makes the information organized, accessible and usable -- a data warehouse, a content repository, a file system, an object manager, a metadata manager or something similar. Typically, these are built on the same scale-out principles as the underlying infrastructure.
One level up you'll find a rich suite of analytic application tools -- all of which are usually reasonably computationally intense, and frequently I/O bandwidth intense. That's a long discussion in itself.
Perhaps the most important layer is the next one up -- presentation, collaboration and workflow. After all, the entire environment usually exists to benefit the knowledge workers and specialized data scientists who are chartered with extracting value from big data. And these people typically aren't as effective in isolation.
One part of the discussion that's frequently missing today -- but will ultimately be much more important -- will be the need to surround these environments with a strong combination of security and GRC (governance, risk and compliance) management.
If you think about, the whole exercise around big data is assembling massive quantities of high-value - and potentially high-risk - information. Very often, this information is pulled out of the context it was originally created in (and its associated security and GRC), and put into an entirely new context.
For example, think about the potential -- and the risks -- associated with collecting vast amounts of health-care records. Or credit card transactions. Or all the web sites you've visited in the last five years.
And the people looking to extract value from that information aren't always the ones who are concerned about the potential risks involved.
If I'm a business leader, how should I think about the big data opportunity?
The good news? Get the stack right -- and link it to a set of strategic objectives -- and you've got a powerful engine for sustained value creation. I think the hard part for many people will be (a) recognizing the opportunity, and (b) organizing for success.
But -- make no mistake -- the skills associated with extracting value from massive amounts of information (big data warriors?) are still rather scarce and unique. More and more bright people are being drawn into the field, but at present it's a rather precious set of skills and mindset.
Any specific examples you'd care to share?
One of the most common ones I encounter frequently is the whole data warehousing discussion.
I think we all sort of know what data warehousing is all about -- it's been around for a while -- but right now the field is going through a major transition.
If you listen to the current business school thinking -- Tom Davenport et. al. -- there's a strong theme around empowering business leaders with better analytical information in context, that in turn leads to better (or at least better informed!) business decisions.
Not to oversimplify, but broad-based analytics proficiency can be a powerful competitive weapon in so many business models that you'll encounter.
Now, compare that emerging business mandate with the traditional IT-centric view of data warehousing -- constrained resources, limited sources and quantities of information, difficult to use in a flexible and agile manner, and so on.
It's pretty clear that's something's got to give. And I think it's already started to happen.
The more progressive approaches we're seeing are all around business enablement vs. providing a closed-end "IT solution".
Take a modern, flexible cloud-like infrastructure and add a modern data analytics platform. Feed it with multiple data sources, and not just a few. Make the environment very easy to consume in a self-service environment, with just about any analytics application that the business would care to use.
Wrap all of that with great capabilities to present, discuss and collaborate around the insights gleaned.
More importantly, start to invest in creating a group of "data champions" on the business side who can use this environment to create all sorts of interesting insight and business value, and -- hopefully -- show others how to do the same.
It's a definite change in mindset -- it's all about enablement to create new sources of value. And that can be hard for an IT leader -- or business leader -- to get their head wrapped around.
I would suspect that -- as a storage vendor -- EMC is investing heavily in this space.
You're right, we are. But -- since EMC is much more than a storage vendor these days -- the investment pattern is far broader. And it's quite a list -- I hope I do it justice!
I guess for starters, there would be our storage platforms targeted at big data applications -- Isilon for data that usually lives in one location, and Atmos for data that needs to be geographically dispersed.
Virtualization -- from VMware -- plays a strong role in making the IT infrastructure more efficient, more flexible and much easier to manage.
Our investment in Greenplum as a next-generation data analytics platform is certainly representative of a key investment. Being able to deliver that as software, or as part of an integrated stack -- such as the Greenplum Data Computing Appliance, or perhaps Vblocks from VCE -- makes the value proposition far easier to deploy and consume.
Our RSA division plays a very strong role in providing the capability to secure and manage GRC in these environments -- our DLP offerings as well as Archer GRC look to be a promising fit here.
Our IIG team -- information intelligence -- is very proficient at managing large amounts of structured and unstructured content. Both IIG and the Greenplum team also have strong capabilities in supporting the presentations, collaboration and workflow aspects of these environments.
Within EMC Consulting, we've amassed a critical mass of business-oriented consultants who can help our customers spot the opportunities, and organize effectively to reach them.
And finally, we've started discussions with a number of service providers who see an opportunity in delivering these capabilities as an external service.
There's still a lot to be done, but I think we've got a very strong initial position which should only strenghten over time. This industry trend will take a few years to unfold.
Any final thoughts to share?
For those of us in the IT business, we're always looking for ways that we can deliver more value back to the organizations we serve. It's what we all aspire to do.
When you dig into big data, you'll probably find just that -- a rare and compelling opportunity to redefine how the organization uses information in new and fascinating ways.
But, like all great things, it won't be simple and it won't be easy. It'll take strong leadership and commitment to depart from business as usual, and learn how to teach organizations to extract untold value from the information that is all around us.
Who said IT was boring?
Hi,
The "Big Data " is slowly populating into every single industry that exists today. More and more digitalization performed in these industries have infact contributed to an increase in the amount of data , both structured ( huge chunk of files with single file format ) and unstructured ( data that are of different file formats ). One industry that in already sinking in with a huge data load is healthcare.
Governments across the globe have emphasised the need to mandate the Electronic health Record (EMR) system. A huge amount of information in form of e- prescription, images, scans, laboratory results, bills are being fed into the EMR. Many hospitals across the globe are allocate huge budgets in re- structuring their whole technological infrastructure to support their growing data .Instantly this raises a question in my mind.
Will healthcare industry be the first to witness the “Big Data effect” and will it serve as a initial play ground for any market participant who likes to captivate on the Big Data?
A Short Introduction of Myself :
My name is Sunil Sasidharan, working as a Research Analyst with the Healthcare IT Sector in Frost & Sullivan. You can contact me @
Work Phone : +91 44 6681 4438
Mobile: 9894955362
E mail : sunils1@frost.com
Posted by: Sunil Sasidharan | March 24, 2011 at 07:44 AM
Luckily HP is also investing in this space heavily with storage, compute and software, we closed the Vertica acquisition yesterday and we now have some of the largest web and media companies using the HP stack in its entirety. Not content though and are working on some fantastic POC's where we are running analytics on the storage itself at some very large points of scale.
Posted by: Andy Sparkes | March 25, 2011 at 07:16 AM
HDS VSP is already getting lot of traction and the new feature like Dynamic tiering and 3D growth will definitely put pressure on the competition.
Posted by: vijay | March 31, 2011 at 09:51 PM
Andy and Vijay
Thanks for dropping by my personal blog and sprinkling it with competitive fertilizer. Let me know if I can return the favor sometime?
-- Chuck
Posted by: Chuck Hollis | April 01, 2011 at 09:30 AM