Agreed, I'm curious as well. We load tested with real-clients faux-users, up to 1 million concurrent. And only stopped at 1 million because the test was becoming cost prohibitive.
Under Performance. Per watt the fuse/rupy platform completely crushes all competition for real-time action MMOs because of 2 reasons:
- Event driven protocol design, averages at about 4 messages/player/second (means you cannot do spraying or headshots f.ex. which is another feature in my game design opinion).
- Java's memory model with atomic concurrency parallelism over shared memory which needs a VM and GC to work (C++ copied that memory model in C++11, but it failed completely because they lack both VM and GC, but that model is still to this day the one C++ uses), you can read more about this here: https://github.com/tinspin/rupy/wiki
These keep the internal latency of the server below maybe 100 microseconds at saturation, which no C++ server can handle even remotely, unless they copy Java's memory model and add a VM + GC so that all cores can work on the same memory at the same time without locking!
You can argue those points are bad arguments, but if you look at performance per watt with some consideration for developer friendlyness, I'm pretty sure in 100 years we will still be coding minimalist JavaSE (or some copy without Oracle) on the server and vanilla C (compiled with C++ compiler gcc/cl.exe) on the client to avoid cache misses.
Not other than the one linked in the comment above. I have been reaching out to EVERYONE, and nobody can explain this to me, but I'll implement it myself soon so I can explain it.
My intuition tells me the VM provides a layer decoupled from the hardware memory model so that there is less "friction" and the GC is required to reclaim shared memory that C++ would need to "stop the world" to reclaim anyhow! (all concurrent C++ objects leaks memory, see TBB concurrent_hash_map f.ex.) That means the code executes slower BUT the atomics can work better.
As I said; for 5 years I have been searching for answers from EVERYONE on the planet and nobody can answer. My guess is that this is so complicated, only a handfull can even begin to grook it, so nobody wants to explain it because it creates alot of wasted time.
The usual reaction is: Java is written in C, so how can Java be faster than C? Well I don't know how but I know it's true because I use it!
So my answer today is: Java is faster than C if you want to share memory between threads directly efficiently because you need a VM with GC to make the Java memory model (which everyone has copied so I guess it must be good?) work!
But no guarantees... you never get those with C/C++, I stopped downloading C/C++ code from the internet unless it has 100+ proved users! So stb/ttf and kuba/zip are my only dependencies.
> My intuition tells me the VM provides a layer decoupled from the hardware memory model so that there is less "friction" and the GC is required to reclaim shared memory that C++ would need to "stop the world" to reclaim anyhow! (all concurrent C++ objects leaks memory, see TBB concurrent_hash_map f.ex.) That means the code executes slower BUT the atomics can work better.
I dunno about the GC bits; after all object pools are a thing in C++ so you have a consistent place (getting a new object) where reclamation of unused objects can be performed.
I think it might be down to mutex locking. In a native program, a failure to acquire the mutex causes a context-switch by performing a syscall (OS steps in, flushes registers, cache, everything, and runs some other thread).
In a VM language I would expect that a failure to acquire a mutex can be profiled by the VM with simple heuristics (Only one thread waiting for a mutex? Spin on the mutex until its released. More than five threads in the wait queue? Run some other thread).