In the automotive world, a defective part from a supplier can result in expense and tragedy. Witness the massive Takata airbag recall, affecting dozens of manufacturers and tens of millions of vehicles on the road today, including potentially yours :(
The IT world is no different, we are all dependent on components from others, especially Linux and open source code. Bugs are found, some are serious -- and they must be quickly patched at considerable effort and expense, otherwise tragedy may await.
The severity of the bug resulted in a "PATCH NOW!" directive to the IT community at large. While not as nasty as the infamous Heartbleed or Venom bugs, this one merited a serious and immediate response.
For many IT shops, this sort of all-too-common fire drill involves not only a lot of effort, but downtime as well.
Except for Oracle shops, that is.
The Magic Of Ksplice
One approach is to simply update the code on storage, and restart everything. Downtime is never ideal, and you silently cross your fingers when things are coming back up.
Unfortunately, that's so often the norm.
Another popular approach is to use clustering technology for fast failovers. Update shared code on storage, and then failover each server in sequence using a clustering technology to minimize production impact. Better, but it's still a process that deserves close supervision.
These days, there's a third approach, and that's Ksplice.
Ksplice is a Linux framework, pioneered by Oracle, that enables hot-swapping of code modules on-the-fly with ZERO disruption. Old code is unhooked and new code is re-hooked -- and the application is completely unaware that anything has happened.
But what about user space code, like the aforementioned glibc?
User space code is a much harder, as Oracle has to develop individual harnesses for specific libraries that are widely in use. It's not a generic capability. If you're running user code that has been modified with one of our Ksplice harnesses, you're golden -- otherwise you have to patch like everyone else.
Fortunately, the code in question already had a Ksplice harness as part of our distribution. Anyone running recent Oracle Linux code (and subscribing to the appropriate service) was able to non-disruptively patch the defective library code with ZERO downtime and ZERO impact to production.
It's an impressive feat, one that I'm sure was appreciated by more than a few Oracle customers.
It should be pointed out, it's the same Linux that's used in our Oracle Engineered Systems: Exadata, Exalogic, Exalytics, Supercluster, Oracled Database Appliance, Big Data Appliance, Private Cloud Appliance, Zero Data Loss Recovery Appliance and so on.
The Value Of The Red Stack
Like automatically hot-patching a critical bug across a vast army of Linux-based systems with no downtime.
While I do see the historical appeal of rolling-your-own IT stacks, it's hard to argue with the ease and convenience of a completely engineered stack. Including nifty features like this one.
Just curious: for all you non-Oracle Linux admins out there, how long did it take you to find, patch and restart every affected system?
Just curious :)