Hacker News new | past | comments | ask | show | jobs | submit login

At the last user conference for some iSeries-based software that we run, IBM had a booth where they displayed a 2U server with dual 16-core Power7 CPUs. They bragged that it only ran Linux and would save us a ton of money. They started out at about $20,000 US.

How small is this market? You'd have to have apps that were written to take advantage of Power7. Equivalent x86 Linux servers are 1/4 the price.




I think the horsepower on those machines shouldn't be underestimated, because they are not entirely as equivalent as you think... I was thoroughly surprised when an unoptimized (but correct!) ChaCha20/8 implementation I wrote on a 3.0GHz POWER8 little-endian machine was about as fast as the latest 3.5gHz Xeons @ AES-256 with AESNI (about 1.3cpb vs 1.0cpb IIRC, but the latter has a dedicated hardware unit for it!) On that same Xeon, the ChaCha20 code only hit somewhere around 5cpb - that's software vs silicon!

It also has 170 cores and was actually a QEMU instance (w/ hardware virtualization extensions) vs raw dedicated metal. If you're doing any kind of numerical or analytic workloads (even databases), I wouldn't throw them aside so quickly. You can even get CUDA for them these days, and certain physical addons like CAPI allow you to map and coherently share physical CPU address space with FPGAs or GPUs. If I could get those things in a reasonable workstation configuration, I'd probably go for it tbh.

(I'd be more than willing to repeat this and post some more accurate numbers if anyone cares. I also need to get around to benchmarking AESNI vs that POWER8 machines _actual_ dedicated AES unit. The benchmark above was only flexing its vector/integer unit capabilities. ;)


If you're getting a 4x difference in IPC using a crypto microbenchmark from compiled C code (i.e. it doesn't sound like you're bandwidth or I/O limited), there has to be something else at work. POWER8 is a nice core, but it's not that wide. Maybe the compiler was recognizing your operations and replacing them with AES primitives?


Caches and memory latency/bandwidth can have serious effects as well.


Yes, but at this kind of multiplier only in the case where the entire test is 100% cache-resident on one CPU and spilling on the other. Crypto stuff tends to have small working sets, so my intuition is that it's got to be something else.


an ASM optimized chacha20 is faster than AES-NI on newer intel chips.


> Equivalent x86 Linux servers are 1/4 the price.

You're severely underestimating the cost of dual-proc 16-core Xeons (about $3500 each for the E5-2698v3), and by the time you add memory, storage, I/O, networking, and other necessities, you're easily in the $15-20k range.

Source: I work for an integrator.


Just to clarify: P-series (Power/POWER8) Linux is not the same as what is announced here. LinuxONE runs on the System z (mainframe / s390x) platform.


Yes, but they now share the same microarchitecture. s390x is mostly a difference in the microcode.



"Equivalent x86 Linux servers are 1/4 the price."

That doesn't sound right to me. What are you considering an equivalent x86 machine?


Hypervisor is built-in. Single-core up to 2x faster clock than x86 per core. Double the cache. Decimal support built-in is great for financial calculations. Security advantage in that about every malware and attack tool is written for x86 with some attention shifting to ARM.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: