> *I do appreciate the compliment that us embedded software engineers aren't "me...

> I do appreciate the compliment that us embedded software engineers aren't "mere mortals" ;-)

With all due respect to embedded software, the stuff that you have to deal with is limited in scope. I'm not saying it isn't hard or challenging, but nonetheless it is limited in scope. And this matters - I also worked a lot with C/C++ and while C/C++ developers can easily reason about things like cache locality or branch predictions or data structures, if you want to block them at an interview the easiest thing to do is to ask them to do some strings processing. The point of having high-level abstractions is to build bigger, more complex things and with C/C++ the pain starts straight from the get-go.

> even for people who are experienced with it it takes a bit longer to write robust code in C than it does in a higher level language, but the performance benefits can be enormous

Err, no, not really. Tackling multi-threading issues in C is absolutely horrible, coupled with the fact that C/C++ do not have their own memory model, so you end up with really hard to reproduce bugs by just updating some packages on the host OS. Libevent has always been a joy to work with in the context of multi-threading, the result being a whole generation of insecure and freakishly sensitive servers.

On top of the JVM you've got a good memory model, you've got multi-threading done right, you've got async I/O without headaches and as far as the concurrency model is concerned, you can take your pick from light-weight actors, futures/promises, shared transactional memory, parallel collections, map/reduce and so on, without any of the associated headaches.

There's also the issue of using all the hardware that you have. We are only scratching the surface of what can be done in terms of GPUs, but a high-level language running on top of the JVM or .NET makes it feasible to use the available GPUs or other available hardware resources by using high-level DSLs. The Liszt DSL is a really cool example of what I'm talking about: http://liszt.stanford.edu/

So if you mean "performance" as in measuring the time it takes for a while-loop to finish, then sure C is the best, but go further than that to efficient usage of all available resources and the problem is not really black and white.

> you would be spending ages optimising a Scala implementation

That's not true in my experience. At the very least with the JVM you can quickly and easily attach a profiler to any live production instance. In so far as profiling and debugging tools are concerned, the JVM is the best. And that's where the real difference comes from my experience in the real world - optimizing without real profiling is premature optimization and low level optimizations many times stay in the way of more high-level architectural optimizations with better benefits.

Speaking of spending ages on stuff and the server-side, good luck debugging what happened from a 3 GB core dump generated on a segfault, because somebody though that an Int is actually a Long.

Also, just lust week I tested our servers on top of Amazon's c1.xlarge instances, which have to be 64bits, servers which are normally running on c1.medium which are very cost efficient, but that are better left to run with a 32bits OS. In 15 minutes I was able to basically migrate from 32bits to 64bits. Good luck doing that with C/C++.