There was a time decades ago when I was intensely interested in CPU technology. RISC, CISC, all that.
Endless debates about which one was "better", which one was going to win in the long term, etc.
Well, we know how the story ended up for most of the datacenter market: it’s mostly an Intel world. Like most people, I thought "well, that's that": most everything was going to run on x86 unless there was a good reason not to.
Sort of like regular-grade unleaded gas, or basic cable TV.
Fast forward a bunch of years, and I land at Oracle. I find that the Sun-derived technologies are alive and well, and carving out a fascinating market segment where everyday x86 doesn't do so well: demanding workloads where using a fewer number of smarter processors can do more work than a boatload of familiar x86.
As part of this week's Oracle Open World and the launch of the new M7 processor, I enjoyed getting my CPU geek back on, and found a lot to really like.
I think you will too.
Generic Vs. Co-Engineered
In my opinion, the familiar x86 has become generic -- the basic architecture is expected to support a vast universe of workloads and use cases.
It doesn't have to be exceptionally good at any particular use case, just decent at all of them.
Oracle created the M7 as an optimized engine for enterprise software: Oracle's and others. How many CPUs have been designed by enterprise software companies? Not many, I'd argue.
Indeed, the theme for the M7 is "software in silicon", and it’s an apt description.
You'll see clear evidence of this extensive co-engineering as we dig through the highlights.
But, make no mistake, the M7 is awfully interesting -- even without the presence of Oracle software.
Basic Speeds And Feeds
At first glance, it's pretty clear the M7 is nothing like a familiar x86 processor.
32 cores, check. 4.13 GHz clock speed, check. Outrageous memory and IO bandwidth, check. Huge chip cache, check. 8-way SMP using glueless logic, 16-way using switch ASICs, yup.
OK, it looks pretty freakin’ fast – but you'd expect that from a brand-new CPU.
So, what else is there?
Security In Silicon
Upping the security game is on everyone's mind these days. It's war out there, and advanced weaponry is very much in demand.
The first major new feature here is silicon secured memory.
It's dubbed as the first-ever hardware-based memory protection for applications -- dramatically improving both security and reliability.
Here's the problem: malicious (or buggy) code can access and/or overwrite memory. The infamous Heartbleed attack was a buffer over-read attack; Venom was an over-write attack.
Silicon secured memory can help prevent both.
How it works is pretty slick: hidden "color" bits are added to memory pointers (the key), and in-memory content (the lock). If there's not a match, the access is aborted, and can be trapped for further processing.
And because it's based in hardware, it can run all the time with almost no performance impact.
The second, perhaps even more useful feature, is newer dedicated crypto engines.
32 of them per processor, to be exact, supporting the broadest range of ciphers in the industry.
This design enables strong encryption to be used universally, with almost no degradation in performance.
That’s important, because up to now there was a difficult choice to be made – do you want performance, or encryption? Extreme performance with full encryption changes that equation.
SQL In Silicon
Databases are not only getting bigger, they're moving into memory quickly -- the need for speed never ends, does it?
The M7 once again has two very cool (and completely unique) capabilities implemented as a co-processor, known as the DAX - data analytics accelerator.
The first feature leverages the ability of Oracle 12c database to support dual formats in memory: rows for transactions, columns for analytics. Each DAX engine is able to load multiple columnar values from main memory into special registers, and scan them completely independently of the core processor.
The result is nothing short of amazing: the ability to scan *tens of billions* of values per second, per processor. Nothing else even comes close. Small data sets, not a dramatic improvement. Big data sets, enormous improvements.
The second feature in the DAX is a set of special memory decompression engines that can unpack in-memory database values at full memory speed, greater than 120 GB/sec.
This means larger databases can fit in smaller memory spaces, in addition to improving read speeds from main memory to CPU.
One test showed 6x compression of a 1 TB Oracle database being shrunk to 160 GB. Six times larger databases, or only use one-sixth the memory -- you get the idea.
Again, impressive.
So, What Can All This Superfast Stuff Really Do?
Let's start with the 32 crypto engines on the M7.
An M7 has a completely unfair advantage over both Intel and IBM Power series. Results will vary by cipher used, but as you can see here, a range of 4x-35x faster can be reasonably expected.
If you're moving to a world where strong encryption is used by default, that's a huge advantage.
Here's another way to look at it.
Consider a single M7 CPU in the new Oracle T7-1 server.
Imagine you're ingesting 8 GB/sec of AES-128-CBC encrypted data, which needs to be processed, and ultimately sent to disk.
One test showed that only 19% of CPU resources were consumed by crypto, leaving 81% to do useful application work.
Not enough for you?
Oracle builds some pretty big systems.
The largest M7-16 (sixteen M7 CPUs) can theoretically encrypt/decrypt at an amazing 1.3 terabytes per second. Good news, smaller sizes are available.
What about query performance?
Flash storage – in any form -- can't even begin to compete with the combination of CPU and memory technology used here.
One early beta customer found an astounding 83x performance bump on a routine query vs a flash array. As always, your mileage may vary.
Indeed, the levels of performance are so astounding that it's perfectly reasonable to assume a single system can do both OLTP and heavy analytics, with ample performance to spare.
How about big data performance?
So glad you asked.
Here's the result of another test running Hadoop-based Terasort against a 10TB datastore.
Just to make things more unfair, the M7 was set to do full AES-256-GCM crypto, the x86 and the Power processors weren't.
A four-way T7-4 was 3.8x faster than an eight-way Power cluster, and 3.5x faster than x86, -- while running fully secured.
What about cloud apps?
More good news.
A single T7-4 (small, four-way system, 128 cores) was put up against an impressive 12 Cisco C240 M3 configuration. The test was using the Yahoo Cloud Serving Benchmark against the Oracle NoSQL Cloud Database.
Not even close.
The vastly smaller (and cheaper) T7-4 delivered almost twice the performance as the larger, more expensive Cisco configuration.
What Does All This Mean?
Not everybody needs always-on encryption at memory speeds. And not everyone needs screaming performance from in-memory databases, or silicon secured memory, or smaller systems that do the work of much larger ones, or any of the other amazing capabilities of the M7 as compared to alternatives.
But some people do. And you know who you are.
That being said, everybody is looking for ways to reduce costs, and there's an interesting proposition here.
Let's assume for the moment that the M7 delivers a substantial improvement to do much more work per core, and more work per socket. That impacts IT costs in two substantial ways: obviously less hardware, and also fewer licenses for compute-licensed software. I'd encourage people to do their own math.
Congratulations to John Fowler and the Oracle engineering team for delivering such an amazing piece of silicon.
Your fan club should grow considerably as a result :)
--------------------------------
Like this post? Why not subscribe via email?
very enlightening article. Thanks Chuck
Posted by: Tom | December 17, 2015 at 09:22 AM