The economy seems to be picking up a bit, especially for us IT vendors.
As a result, we're getting more high-level queries from big IT shops than usual.
A particularly interesting storage RFI (request for information) crossed my desk last week. The long-term questions gave me an opportunity to elaborate a bit on what's new in the storage world -- both today, and in the near future.
I thought it'd make a good blog post for everyone. I've obscured a few details to protect confidentiality (both ours and theirs!), but -- otherwise -- you may find it a good read.
Bonus question: see if you can guess the industry segment?
Introduction
Dear __________
I am pleased to respond on behalf of EMC to your questions regarding storage technology evolution from the present to the next few years.
Before we get started, I'd like to note a few disclaimers.
First, any one of these individual topics is worthy of considerable extended discussion. To keep the overview brief, I have elected to share just the major industry themes as a starting point for further discussion.
Second, I regret I am not at liberty to pre-announce specific EMC capabilities and future developments without an appropriate NDA. However, most of the capabilities discussed in this memo are already generally available for production environments and I understand this is a prelude to further discussion.
Third, it's getting more difficult to have a storage-specific discussion in isolation to other parts of the IT landscape, i.e. server virtualization, cloud architectures and newer forms of application development. Any context you can provide regarding other aligned IT initiatives will greatly help in focusing the discussion.
EMC believes that the traditional lines and demarcations within IT are in the process of being redrawn, so any thorough discussion will likely range outside of storage into other disciplines.
Finally, I've added a few additional questions that we frequently get asked by similar customers for your potential consideration.
General Themes -- Storage In Transition
Having personally been in the storage industry for 15 years, I can accurately state the next few years probably won't look anything like the last 15.
We are all aware of the rapidly growing amounts of information that must be stored and protected. Cost and efficiency pressures continue unabated, and new regulations specify how we must store information, and -- sometimes -- where it must be stored.
Not only are individual storage technologies evolving very rapidly, but there are entirely new operational models to now consider as well.
First, storage technology itself is evolving quickly -- from tape to disk, and from disk to flash. Newer approaches to storage efficiency -- data deduplication, compression, spin-down -- are quickly finding their way into mainstream storage applications. And new forms of storage functionality are getting better at putting the right information on the right storage at the right time.
Second, storage architectural models are quickly evolving from either monolithic or modular to scale-out clusters that -- in some cases -- will be geographically dispersed.
Newer storage deployment models are best thought of as dynamic pools that react to new business requirements, usually in concert with widespread virtualization of servers and desktops.
On top of these new technologies and architectures, storage service catalogs are constructed that offer different capabilities to the business at a wide range costs -- with the added benefit that service levels can be dynamically adjusted up and down as conditions change.
Third, while storage itself is becoming far easier to manage, the focal point for more and more storage management is moving outside the storage domain -- either as an adjunct to virtualized server management, business application management or as network service management.
Finally, storage must support more governance-related functions -- compliance with retention, location, protection, security, etc.
1. What will be the developments in the area of amount of storage/drive & storage/unit?
We see physical storage devices moving in two opposite directions very quickly.
On one hand, high performance storage requirements are quickly being replaced by enterprise flash drives, now generally available for about two years.
From a pure technology perspective, enterprise flash makes a near-perfect storage medium -- approximately 30x faster than regular disks with very low power consumption and extremely reliable.
Costs associated with flash are dropping ~70% per year, making them progressively more attractive as a straightforward replacement for fibre channel drives with each passing month.
On the other hand, SATA and sSAS drives are doubling in capacity at a regular cadence. Today's 2TB drives will be replaced by 4TB and 8TB drives. With each larger drive, however, we get roughly twice the capacity at roughly the same device costs.
The evolution of storage media is fairly well understood by the industry, but question remains on how best to bridge the gap between a small amount of very fast (and expensive) flash, and large amounts of very slow (and very cheap) bulk SATA drives?
EMC believes that by dynamically assigning data segments to either flash or SATA (depending on usage patterns), we can get both the performance benefits of flash at roughly the economics of SATA.
EMC refers to this software technology as FAST -- Fully Automated Storage Tiering. We believe that it will fundamentally change the economics and performance of storage as it becomes available in early 2010.
The likely consequence of this development is that the trusty FC drive -- the current workhorse of the industry -- will be increasingly unappealing over time, being replaced by dynamic pools of flash and SATA
(key EMC technologies: EFDs, FAST, V-Max, CLARiiON, Celerra)
2. When can we expect to handle PetaBytes/activity?
Today’s workflows associated with very large datasets do involve an enormous amount of information
logistics – loading terabytes or petabytes from tape, running long
computational jobs that in turn may produce petabytes of data, andhen saving the results back to tape.
Historically, for these environments, EMC has worked with these customers to do custom integration between their existing job scheduling / workflow environments and underlying storage provisioning. Usually this takes the form of large filesystem pools, either using an EMC filesystem product, or in concert with offerings such as Ibrix or Lustre.
More recently, we’ve been able to incrementally improve on these models by introducing virtual provisioning techniques (storage is not allocated until actually used), and dynamic archiving of large data sets to more cost-effective storage.
However, we think much more will be possible in the very near future.
Going forward, we’re extremely interested in attacking what we believe is the biggest part of the problem – rolling data on and off tape. As mentioned later in this response, disk-based replacements for tape libraries have reached the point where they are attractive alternatives in many environments.
Storing archival data on disk (vs. tape) also gives us the potential to automatically tier information up and down as it is being used, rather than moving it from one format to another. Applications and users would see a giant pool of information, the vast majority of which would be stored on giant SATA drives that are compressed, deduped, spun down, etc.
In this sort of approach, the storage array would dynamically move data sets of interest to higher performing media (i.e. flash, FC, etc.) automatically when used, and then move them back again when not being used.
(key EMC technologies: Celerra, FAST, Rainfinity)
3. What new storage technologies can be expected: also look at the role of solid state.
At a physical device level, we expect the current industry reliance on FC drives to be split by enterprise flash drives and large capacity SATA or SAS drives, as discussed in #1 above.
We believe the key to making these combinations of devices useful lies in the automated placement of data segments in the right place at the right time.
Dynamic spin-down of drives has been especially useful in larger storage farms, and has been proven to significantly cut power and cooling requirements when large portions of a storage farm are idle for a prolonged period.
4. What will be the data compression developments: what technologies can be expected?
Compression technologies (dedupe, single instanciing, compression, etc.) are currently finding their way into the entire storage stack: application, database, operating system, file system, network, etc.
As CPU processing becomes progressively faster and cheaper, more and more types of compression are economically viable in multiple locations of the technology stack without impacting performance or imposing undue costs.
Due to the wide variation of data types and customer use cases, EMC believes that there will not be any single "best way" to achieve data reduction. As a result, EMC is currently investing in multiple approaches in backup, archiving, primary storage as well as storage networking.
Many of these use cases (i.e. backup, archive, replication) are showing very attractive results today; other use cases (i.e. extremely active primary data) must wait for future advances in processor technology to become equally attractive.
(key EMC technologies: Avamar, Data Domain, RecoverPoint, Centera)
5. What will be the snap vaulting and other technologies to keep backups of prime data?
Storage snaps are very mature technologies these days, widely supported by most storage vendors. And, as part of an overall backup and archiving strategy, they are an important weapon in the arsenal. Generally speaking, snaps are useful for quick recoveries of relatively recent data ; but as data ages they can become less appealing.
The most recent (and popular) variation of snaps can be thought of as a "continuous snap", also referred to as CDP (continuous data protection), which creates a playback log of data changes allowing recovery to an arbitrary point in time, often to a remote location.
Traditional and continuous snaps can optionally be implemented in the storage array, at the server, or in the storage network. Each approach has its relative strengths and weaknesses.
Remote replication of snaps and backups often enters the picture here to assure that protection copies are stored a safe distance from the primary.
Snaps can be combined with other backup-related technologies, such as source and target deduplication, to create a broad range of new backup capabilities sometimes referred to as "next generation backup and archive".
Newer management tools have addressed the need to orchestrate and manage an end-to-end protection environment, whether using traditional or newer approaches, or any combination.
Finally, many of the newer approaches can offer a self-service model for both specifying backups as well as recovery when needed, easing the burden on centralized IT resources.
(key EMC technologies: RecoverPoint, all EMC arrays, Data Protection Manager, Networker)
6. What will be the latest technologies to support disaster recovery for data?
EMC believes that this area, in particular, will see dramatic changes in the near-to-medium term future.
Today's models center around designated sites with specific roles, i.e. primary and recovery sites. Different replication techniques (synchronous, asynchronous, point-in-time) can be dynamically combined to deliver the desired combination of RPO (recovery point objective) and RTO (recovery time objective) while balancing network costs. Additional capabilities provide for streamlined management, and can control the consistency of data across multiple related applications.
This traditional model of primary/secondary recovery sites is starting to give way to geographically dispersed models where data is redundantly dispersed and can be recovered from any subset of surviving nodes, much in the way parity schemes can recover from a failed disk drive.
These approaches can offer lower storage media costs than traditional remote replication, with the additional protection of being able to survive multiple site failures if desired.
Products that support this scheme with relatively static files and objects are already entering the market; what remains is to provide similar capabilities for traditional "hot" block-oriented data.
When considering "hot" block-oriented data, many larger customers are now interested in using server virtualization to create "global clusters" of processing and information that not only provide workload balancing across multiple locations, but offer redundancy as a side benefit.
(key EMC technologies: EMC Atmos)
7. What will be the changes in (global) file systems?
EMC's view on this is somewhat controversial in the industry.
Although EMC actively develops and markets global file system products, we believe that the traditional hierarchical file system is starting to reach its architectural limits and will likely be difficult to extend to meet future customer needs.
As a result, EMC has been actively investing in object-based storage architectures for several years, with two object-based storage platforms currently in the market.
Object-based stores are different in that each stored object has a unique identifier, and directly supports rich metadata that aids greatly in information logistics: service level, location, security, compliance and more.
We believe that the "file vs. object" discussion is strategically relevant to any industry that has the twin challenges of managing petabytes of important information, and doing so on a global scale.
Update: I neglected to mention that -- yes -- we do very well with large-scale file systems (using Celerra as well as other file system technology partners), we do a lot with global file systems today with Rainfinity, and are very excited about the transition to pNFS. This particular customer has a use case that involves billions of file objects, hundreds of petabytes of information, and does business around the world, hence my focus on object-based approaches. Not everyone's cup of tea ...
(key EMC technologies: Rainfinity - global file system, Atmos - cloud optimized storage, Centera -- content addressed storage)
8. What will be the developments in tape? Format & densities? Is there still a role for tape or can storage replace all?
Tape is not entirely dead, but it's quickly becoming less important over time.
By combining various disk-oriented approaches (dedupe, spin-down, etc.) we can usually replace most traditional tape-oriented use cases with disk-based solutions that deliver both significant capex/opex savings as well as improved access to information.
EMC believes that -- as a result -- many customers will end up using tape as the bottom-most tier in their backup and archiving hierarchy — if at all — and use it for information that is one short step away from permanent information deletion.
Most EMC disk-based backup and archive products support "dump to tape" (while retaining disk-based metadata for search, compliance, etc.) as a final step prior to deletion.
We do note that very little R+D is going into tape technologies these days, and the industry has rapidly consolidated into only a few viable players, foreshadowing less innovation and potential price increases.
(key EMC technologies: Avamar to tape, Centera to tape, Data Domain to tape, EMC Disk Library, etc.)
9. What will be the latest HSM / Archiving technologies?
Many organizations are taking a fresh look at this topic since the cost deltas between the fastest types of storage and the cheapest forms of storage are getting much wider.
The discussion has evolved into two parallel camps: active and passive.
"Active" archiving uses explicit metadata and associated policies to drive the ILM cycle for aging and retention of data.
By comparison, "passive" approaches simply use activity patterns and static policies to progressively age information.
Most organizations end up requiring a combination of both.
Since active archiving requires explicit metadata, this approach tends to be used in specific environments where metadata (and associated policy!) is readily at hand, or can be easily generated: email archives, document and object repositories, database extracts, indexed file systems, eDiscovery applications and so on.
However, there are significant amounts of corporate information that is not amenable to generating and using metadata, and that is where passive approaches become more appealing -- large file systems, databases, unclassified repositories and the like.
EMC offers a wide range of capabilities to progressively and automatically age unclassified information based solely on pre-set policies and access patterns, including large-scale filesystems, transactional databases as well as newer object-based approaches.
(key EMC technologies: SourceOne, Kazeon, RSA DLP, Centera, Atmos, Celerra, FAST)
10. What other technologies will become available to handle large (Petabytes) amounts of data?
Most of the industry focus now appears directed towards object-based storage repositories as the most viable approach for multi-petabyte storage farms that must be geographically dispersed. Several of these are in the market already from EMC.
Object-based approaches use a simple "claim check" abstraction to simplify storing and retrieving data. The object definition usually includes metadata that identifies source, service level, retention, security, etc. and is directly associate with object, vs. being stored in some external repository.
More recently, this approach has been extended to incorporate and leverage geographical distance -- for example, making transient copies of information closer to users when needed, and reverting back to a single instance when no longer needed. As noted above, geographical dispersion of data can offer improved protection at lower costs than traditional approaches.
The challenge usually exists in bridging legacy information stores (file systems, repositories, etc.) with newer object-based approaches. EMC has invested in a number of bridging technologies to assist with this migration.
For large-scale information stores, EMC believes object-based approaches can easily be shown to offer order-of-magnitude better results than traditional file or block approaches.
(key EMC technologies: Atmos, Centera)
Questions We Frequently Get From Other Customers
While we're more than pleased to answer these questions in detail, we feel obligated to share with you some of the more interesting questions we get from our other customers.
Some of these may be useful to you, others not.
Here are some examples:
11. As storage technologies evolve, how do you see storage networking evolving? What is the role of internal (DAS) storage, whether disk or flash?
12. As more of the server and desktop landscape become progressively virtualized, what are the implications for storage and storage-related disciplines?
13. As more organizations move to internal service provider models, what are the implications on how storage is provisioned, orchestrated, monitored and reported on?
14. How do current and future governance, compliance and security concerns impact storage architecture and operations going forward?
15. How are larger organizations defining service catalogs to the business, and implementing newer chargeback and/or pricing models?
16. As more and more external storage service providers enter the market, how should we be architecting our environment to take advantage of these services, should we wish to at some point?
17. What does "best in class" storage and information management look like today? In two years? In five years? What are the organizational implications in terms of roles, skill sets and workflows? Are there implications for procurement strategies?
-----------------------------
The Bottom Line
It's nice to see a renewed wave of interest in large-scale storage strategies. A lot has changed in a relatively short period of time -- and more is likely to change in the near future.
How would you have answered some of these questions?
Chuck,
did you read your post before submitting it? Do you realise how convuluted and complex you have just made EMC's list of toolsets. It sounds like something out of a second hand car yard. I have 100 cars that get you from A to B, Most use different fuel and have different ways to get there but trust us because we have 100 cars to get you from A to B but you might need to swap cars, move your luggage and change a few parts to get there
Not sure about you but id prefer a little simplicity in my life.
Posted by: VStoragemonger | November 30, 2009 at 10:28 AM
and if u mention EMC products enable simplicity ill puke.
I have a customer trialling Avamar for 12 months with a full time resource on site from EMC to manage it because EMC have stipulated they can save the customer 25% less storage on backup.
The Catch.
The customer isnt allowed to manage it or know how EMC are managing the backups.
Mystical mind trick, Dont let the customer know a product is clunky till you recieve the PO
Posted by: VStoragemonger | November 30, 2009 at 10:33 AM
Hi Vstoragemonger
You're right, we have a lot of tool sets, don't we?
For some people, they're not happy unless there's One Thing that solves all their problems. For some use cases, that's an achievable goal.
Other folks have big, hairy and complex problems to solve. They have to use multiple tools, often from multiple vendors, to solve their problems.
Different vendors have different strengths.
One of EMC's strengths is that we can offer reasonable "all-in-one" solutions for moderately complex problems (EMC Celerra, for example), and we also have the breadth and depth to do a lot of specialized stuff as well.
If the complexity is a recurring problem for you, there's always vendors willing to provide outsourcing, system integration, etc.
And, yes, we in the vendor world are always working to make things simpler, more integrated etc. Our thinking is that having an answer (of any sort) is better than no answer at all.
Thanks for writing ...
-- Chuck
Posted by: Chuck Hollis | November 30, 2009 at 11:37 AM
Vstoragemonger (not your real name)!
I went and followed your Twitter link.
Turns out that (a) you've never used it, and (b) you're using the name of "Paul S", who might be the same guy who works for NetApp and enjoys role-playing on the internet.
Most companies consider it a serious breach of ethics to pretend you're someone else. I don't know what NetApp's official policy is on this one, but at EMC that'd be cause for serious disciplinary action.
If I'm mistaken, please offer specifics that would help all of us understand you're a real person, and not some made-up entity from a very misguided competitor.
-- Chuck
Posted by: Chuck Hollis | November 30, 2009 at 12:59 PM
by the way chuck, Why do you think im having an go at Cellera, I actually suggested that it would be far easier for customers if EMC actually converged some of the toolsets to provide far better simplicity rather then there are 10's of ways to skin a cat.
Posted by: VStoragemonger | November 30, 2009 at 05:44 PM
Vstoragemonger (or whatever you want to call yourself today)
I'm always up for an intelligent discussion.
Unless you can bring one of those here, I'd suggest you find entertainment elsewhere.
-- Chuck
Posted by: Chuck Hollis | November 30, 2009 at 11:13 PM
Why did you delete your update about Vstoragemonger's "top secret background"?
Anyway, I'm on the same line than he is. You should really deduplicate your portfolio, now that you claim to be an expert in it...
Posted by: Brainy | December 01, 2009 at 01:49 PM
Brainy
I know that the NetApp fanbois club likes to make a big deal of EMC's extensive portfolio as "complex, unneeded". Hey, whatever works for them.
When we go head-to-head, we focus on EMC Celerra most times. We go feature for feature, cost for cost, and do quite well. Our Celerra business continues to grow leaps and bounds. IDC says we have more NAS market share, and it's been that way for a while.
Bottom line: if all you have is the bandwidth to focus on a single product, that's the product for you in most cases.
However, there's no downside to having more tools in your belt if you need them.
So, do you send snide comments to Toyota saying they have too many car models, and should delete a few?
C'mon, get real.
-- Chuck
Posted by: Chuck Hollis | December 01, 2009 at 01:59 PM
No comment about your "self-censorship?"
Toyota is maybe the best example for product de-duplication. They have a very limited offering of models and you can only choose a few model variants. This, their Kaizen development style and not buying too many other companies, has made them the #1 (ok, temporarily #2) car maker in the world.
EMC always talks about how they integrate all of their products. How would this be even possible, when you buy new companies all the time?
Seems very Anti-Kaizen to me ;-)
Posted by: Brainy | December 01, 2009 at 02:30 PM
Hi Brainy
Again, we're going to need a new handle for you.
How about "Argumentative"?
Toyota (and Lexus and Scion) has dozens of branded models. They've done a great job of segmenting the market, solidifying their position in some, and investing in entering others (pickup trucks, luxury cars). They even are going after Ferrari if you read the car mags.
Now, if your focus is mass market (e.g. Toyota Corolla), you're right, they've got it down to about 3-4 submodels, and maybe a half-dozen option packages. The extreme example is the Scion, which comes largely pre-optioned.
Toyota, overall -- a very different story. Go take a complete look, and then tell me what you think.
Automobiles are a relatively mature technology when compared to this IT stuff. Not of lot of opportunity to buy R+D in this industry, just brands and market segments. If you give each industry more than a passing glance, you'd probably agree.
Regarding acquisition and integration -- you're right, we're buying companies all the time. If I were to put them on a timeline, you'd see the older acquisitions more fully integrated, and the newer ones less integrated.
Would this surprise you?
For someone who goes by the "Brainy" handle, I'd suggest you raise the bar a bit on your comments.
-- Chuck
Posted by: Chuck Hollis | December 01, 2009 at 05:32 PM
And still no comment, why you deleted the update about Vstoragemonger. Something to hide?
Look, it was not me who brought Toyota up, which in this case was probably the worst comparison.
You should have brought GM up, looks a lot more like EMC. Has even the word "General" in it :-)
Posted by: Brainy | December 02, 2009 at 01:22 AM
Brainy
While I'm sure you're enjoying yourself tossing out these half-baked comments and insults, it gets a bit tiresome for everyone else.
How about you try and offer something intelligent to the discussion?
Thanks
-- Chuck
Posted by: Chuck Hollis | December 02, 2009 at 07:16 AM
1. A few comments up there you are claiming that vstoragemonger might be a NetApp Employee.
2. You don't have an explanation about the pulled update to your commment, after you found out he is not.
- I think the reason would be interesting to many others. At least to vstoragemonger himself.
3. It is you insulting me, by proposing that I should change my nickname.
4. Let's go back to my question. Why was the comment update pulled?
5. You can censor my question, but I can put it up somewhere else. But I think your readers, would prefer to read the answer here.
Posted by: Brainy | December 02, 2009 at 01:01 PM
Brainy --
I edited my comment because I had incorrect information.
No big deal to most people -- why is this such a big deal to you?
Sorry if you didn't like my suggestions around your handle. I thought, well, since you feel comfortable dishing out random insults, you'd fully expect something similar in return.
My advice to you is simple: treat others how you would like to be treated. That's true in the physical world, and also the online world.
One last chance to change the tone of your discussion here, otherwise your off-topic rants will be probably banned going forward.
I am under no obligation whatsoever to put up with this nonsense.
-- Chuck
Posted by: Chuck Hollis | December 02, 2009 at 03:07 PM