You know, there's so much neat stuff in our product portfolio, occasionally some of the more interesting bits get overshadowed by some of the flashier stuff.
There's a ton of bread-and-butter functionality in EMC's portfolio that deserves the occasional airing. And, from time to time, I'd like to do just that -- with your permission.
The recent EMC press release on new Replication Manager functionality for VMware made me stop and think -- does anyone know the full back story here? As far as storage topics go, it's a interesting story in the evolution of thinking around a basic storage function: making local copies.
Hey, we even got a bit of coverage from Beth Pariseau ...
Replication Management At Scale
I believe EMC was the first vendors out there with local replication, dating back to 1995. The features was called TimeFinder, and it initially made BCVs -- complete and separate copies of production databases using the array to make the copy.
It turned out to be wildly popular at the time. Customers found all sorts of uses for making a local copy, and saving time by working on the copy, rather than the primary.
Yes, there's always the need for "gold" quick recovery copies. But it went well beyond that.
Backup acceleration, for example -- if the production application has to be up and humming 7x24, making a backup from a copy of a database is very convenient.
Same general story for running mongo reports off a database that should be focusing on online responsiveness. Or test and development from a production copy.
Indeed, back in the mid 1990s when TimeFinder was all the rage, we found all sorts of convenient uses for a production copy of an important database. In certain SAP environments, some customers would have as many as four (!) independent copies of the production instance.
Mirrored, of course ... :-)
Just to be complete, space-saving snaps weren't really practical in many of these use cases. Customers needed spindle isolation from the production instance -- the whole idea was to be able to use (a copy of) the database without impacting production, and that meant isolating the copy from the source.
Today, most snap implementations just don't do that -- it's all intermixed. Yes, it saves on space, but you may have to think through who's hitting which spindles when.
Before long, we had lots of customers doing lots of local replication.
And how was it all managed way back then?
Scripts. Lots and lots of scripts.
And Scripts Do Have Their Limits
Towards the end of the 1990s, we had many many thousands of customers using TimeFinder, and we were writing a *lot* of scripts for our customers. These scripts would do things like freeze/thaw an Oracle database, make the copy, run some sort of validation against the copy, and then invoke whatever function needed to be done to the copy, for example, kick off a backup job or run a bunch of reports.
Scripts had their limits, though. When the environment changed, they tended to break. Adding or changing functionality required someone who actually knew what they were doing. It was getting clear to many of us that scripting wasn't going to be the right answer here in the long term.
In parallel, EMC was achieving some moderate success with a product known as EDM -- basically, a high-performance backup engine, integrated with storage. In addition to doing traditional LAN-oriented backups, it had the neat capability of being able to integrate directly with a Symmetrix storage array, and use I/O channels to do a complete image backup/restore off the array at astounding speed.
It did this by using TimeFinder, which did a full physical split of a BCV -- business continuance volume. And, along the way, we developed some initial workflow to coordinate the pre and post processing of the split, e.g. freeze the database, make the split, run a logical consistency check on the copy, invoke the backup task, etc.
At the beginning of 2000, these ideas evolved into a broader desire for a replication management product.
The big idea? It should be conceived around the sophisticated workflows people want to use replication for, and not simply a handy-dandy GUI for controlling snaps. Arguably, there are many use cases for snaps and clones, some simple, some not. EMC decided to focused on perhaps the most complex and sophisticated use cases.
Initial Requirements
We wanted it to be able to support the full range of EMC replication products. As you know, that's a pretty long list of array-based and network-based replication.
We wanted it to be able to support most any popular operating system and file systems. We wanted it to have detailed knowledge of common use-case applications like Oracle, Exchange, SQLserver, UDB and applications like SAP and be able to produce logically consistent copies using whatever application-specific mechanisms were available.
Going a bit farther, we thought it should be able to do things like discover the environment (application, servers and storage) and the relationships between the entities.
It should be able to drive workflows that included things like automatically mounting the copy on another server, running validation scripts and kicking off batch jobs against the copies.
And, of course, it should be able to do things like paint a big picture of all your replication tasks, all your replication resource pools, and the like.
More Advanced Requirements
Being EMC, and knowing the customers we typically sell to, there was more that ended up being important. Things like being able to produce consistent replicas of logically related databases.
Not everyone is familiar with federated databases and the challenges they present, but -- oversimplifying -- imagine you had two databases, Orders Received and Orders Shipped. You'd want to make sure that they're always in step. You wouldn't want to recover from a problem and have them be slightly out-of-sync -- it'd create all sorts of havoc.
The concept is often called a "consistency group" -- multiple databases, potentially on multiple arrays, that all have to be lock-step in terms of logical view at a given point in time. EMC Symmetrix has supported this sort of logically consistent replication -- even across multiple arrays! -- for many years. It's especially popular in larger SAP environments, not to mention big mainframe environments.
We also wanted to be able to handle remote splits. Again, we're talking larger shops here, but it was often the case that a customer was using SRDF, and wanted to split a copy off the remote end and do the backup, reporting, or whatever at the remote side.
And, while we're talking larger shops, we needed a role-based management model where different people could do different levels of tasks -- everything from simple operations to the uber-expert, with everything in between.
And, just to make things interesting, a lot of these use cases ran in failover cluster environments, with the appropriate variety of technologies: MSCS, VCS, Oracle RAC, Sun Cluster, HP MC/ServiceGuard and so on.
Not to mention all the different flavors of local copies: mirrors, clones, snaps, etc. across all the different EMC products.
Not too much to ask, is it?
A Daunting Task
I think the trick with this sort of management software is to make things as simple as possible, yet be able to expose deeper functionality when needed. As you navigate through the software, simpler higher-level functions give way to deeper levels of options, script inclusion and so on.
Another important capability is to embed "knowledge" of the environment into the product. This includes things like being able to autodiscover resources and relationships, but goes farther to include things like how Exchange likes to do things, how Oracle likes to do things, and so on.
The product first came to market in 2001, and -- frankly speaking -- it didn't really set the world on fire.
Why? It was a new way of doing things. People thought it was a great product idea, and solved a real problem, but there was a problem. The big customers who were already extensively using TimeFinder and SRDF had already made the investment in heavy scripting, and felt they could keep using what they wanted.
And, for newer customers, unless you had really experienced what orchestrating lots and lots of replicas was all about, you really couldn't understand why we had built it.
But, over time, we slowly won over customers, one at a time. The more complex the replication scenarios, the more they could appreciate what this product could do for them.
However, all products come with tradeoffs. If someone only wanted a simplistic GUI to simply manage a few local snaps, this product was perhaps too feature-rich for some.
What Was Once Physical Is Now Virtual
The transition to virtualized infrastructure creates two sets of challenges: some are simply things you did in the physical world that you have to now do in the virtual world, and others are entirely new to virtualization.
As VMware is supporting more and more sophisticated workloads and application landscapes, it's no surprise that people want to coordinate multiple replication activities, and orchestrate pre and post processing of the replicas.
So, it being VMworld and all, the press release talks about improved support for VMware environments. Actually, that's a bit of a misnomer -- what's really new is improved support for popular application environments (Exchange 2003 and 2007, SQLserver 2005, Oracle 10g/11g, etc.) -- all running in virtual machines.
Actually, the press release probably says more about how we're seeing VMware being used these days than anything else ...
Where Does That Leave Us?
Today, all sorts of storage arrays support snaps. They're incredibly convenient for all sorts of things. We could debate the relative merits of full clones vs. partial snaps, copy-on-first-write vs. other approaches, and so on. I do think that EMC has most of these use cases covered with its various flavors of replication technology.
And yes, we get our fair share of criticism for having a rich variety of replication technologies.
But I think where we've differentiated ourselves here is not providing simple management of snaps; we've built a workflow environment that lets customers use local and remote replication together in all sorts of interesting use cases.
Useful in the physical world, now useful in the virtual world.
And we don't have to write scripts anymore :-)
Courteous comments always welcome!
After having a dig about Storage Virtualisation; I must say I like RM alot. I wish I could get my guys to move away from scripting and make more use of RM. It certainly helped us in the Exchange area. Don't suppose you could make it heterogeneous and support other vendors!
Posted by: Martin G | September 16, 2008 at 02:28 PM
Chuck,
I have been using RM in our production environment for two years now and it keeps getting better! We develop and support a custom in-house app running based on SQL 2005 and running on Windows clusters, with the databases ranging from 500GB to 2TB in size. We use RM to nightly create clones of the production databases and present them to the development servers for daily development. The SQL integration and the automation of jobs has enabled me to deliver a solution for our developers that they could not do themselves because of the amount of data we are cloning each night.
We have had our issues with the product, but over all it was well worth the investment and automates a critical function in our environment. And now we can use it to automate our RecoverPoint bookmarks for SQL and VMWare! Good job guys!
Cheers!
Posted by: Aran | September 17, 2008 at 06:17 PM
Aran -- you made my day!!
Thanks so much for sharing. And I'm sure the RM team will appreciate this as well.
-- Chuck
Posted by: Chuck Hollis | September 17, 2008 at 08:30 PM
It might be a hidden gem, but at what point does an investor get compensated for EMC gems and at what point does he learn they are really pyrite?
EMC does a terrible job at actualizing value from their "gems."
Posted by: David | September 18, 2008 at 11:29 AM
David,
I don't know how familiar you are with Replication Manager and it's pricing, but it really is not that much. You are basically paying for a license per host being managed, on par with license costs for backup agents from most vendors.
And it was even cheaper for us because we had a couple of old RM/SE 3.0 licenses that EMC upgraded for free to RM 5.0 licenses. They did that for all customers that had RM/SE licenses when RM 5.0 came out and RM/SE reached End Of Life.
And compensated? I didn't purchase RM to make money but to save time and effort in managing terabytes of replications and restores for our developers. In the end this saved a lot of time our developers wasted managing restores for all of our development environments.
Hmmm, now that you mention it. Thinking about all of the above we came out ahead in the end after all.
Quite the gem indeed.
Aran
Posted by: Aran | September 18, 2008 at 03:58 PM
I had to argue with both my EMC sales and engineering teams to keep this product as part of our new SAN/NAS architecture. They kept telling me I had enough replication products although not one would quiesce my Exchange or SQL environments for a snapshot or clone. In the end it was my money and RM was cheap. Now that it works with RecoverPoint (which I now own) and VMware (ditto and it we use RPA with it) I feel even more justified for keeping the product. Thanks to the RM team for a great job.
Posted by: Anthony | September 28, 2008 at 08:52 AM
Reading your comment about the RM (Repliation Manager), I tend to understand that this product should do the backup/restore like a snap. However, I found the whole truth today about the latest version RM is not doing what it should be as you advertisement. The RM only replicate at a VOLUME level, not at OS level. That means, when you have a folder with all the SHARE permission and NTFS security permission. If you do a replication that folder to the mount host somewhere in another set of disks array. You CANNOT restore the SHARE permission as you said in the blog. It only restore to the NTFS security permission, but the SHARE to the folder is stripped out during the restore. That indicate that the photographer take a picture of the human face, but left lout the eyes of the image. When you print the image, you do not see the eyes of the human face. I have learned a painfull lesson about the EMC "Replication" that after retore from the RM, users need to manually recontruct the SHARE permission to every single folder in the volume.
Other cheap software like Secure Copy can do a better job as replicate the content as well as the attribute of the files/folders that it replicated.
Please revisit the function of the RM product, and I hope that you will find out the truth, and make it as it suppose to do.
Thanks,
Tung
Posted by: Tung Phan | October 14, 2008 at 07:40 PM
Hi Tung
I'll check, but I think you are misunderstanding something basic.
Replication Manager only orchestrates the copies, it doesn't make the actual copies.
For file systems, that would be a filesystem-based mechanism (e.g. Celerra SnapSure or similar).
I'm not quite sure why you're so angry at me and insinuating that there's something "wrong" with the product.
Nothing in my blog post spoke to NTFS snaps, where did you get that from? I've read your comment several times, and I am having trouble figuring out what exactly you're trying to do, and why.
Again, I think there's a way to do what you want to do. Do you need some assistance?
Thanks -- Chuck
Posted by: Chuck Hollis | October 15, 2008 at 12:41 AM
Does anyone know if RM is compatible with IBM's VIOS system yet which allows for virtualization of HBA's to IBM LPAR's. We own replication manager but there seems to be no support or even interest in supporting VIOS that I'm aware.
Posted by: Tom Whalen | February 21, 2009 at 03:48 PM
Does anyone know if you can configure replication manager to clear an archive bit (uncheck file attribute after successful backup)?
Posted by: S.L. | December 13, 2011 at 02:41 PM