I'm considering building a Java compatible VM that only has static memory alloca...

pron · on Dec 23, 2022

> Why hasn't this been done by anyone yet?

That depends what you mean by "this." The Java spec requires you to support `new A()` by returning a fresh object or throwing a VMError (like OutOfMemoryError). If by "this" you mean that allocations would succeed until some fixed amount of memory is exhausted, then, as others have pointed out, this has been done even in OpenJDK. If by "this" you mean that the allocation of some particular set of objects -- say, only those allocated during class initialisation -- would succeed regardless of memory consumption and all others would fail with an OutOfMemoryError, then I guess that it hasn't been done in that particular way because people haven't found it particularly useful as most Java programs would fail, but you can give it a try.

RTSJ, the specification for hard-realtime Java [1][2], actually goes further than that and supports both "static" memory (ImmortalMemory) and arenas (ScopedMemory). So if that's what you mean, then it has been done.

[1]: https://www.rtsj.org/specjavadoc/book_index.html

[2]: https://www.aicas.com/download/rtsj/rtsj_76.pdf

lmm · on Dec 23, 2022

> Why hasn't this been done by anyone yet?

What would be the point? Java programs assume GC (or at least, they assume they can allocate memory and not worry about when it will be freed). If you don't have GC then you're not going to be compatible with extant Java programs, so what's the point in trying to be "Java compatible" at all?

bullen · on Dec 23, 2022

Because 1) you don't want GC pauses 2) you don't want GC code to bloat the VM 3) You don't want memory leaks 4) You don't want people to not know what they are allocating 5) Static allocation is enough to make anything.

6) int arrays do not have cache misses (but you might need to pad them to avoid cache invalidation) 7) parallel atomic multicore works on int arrays out of the box!

sorokod · on Dec 23, 2022

I assume that by "you" you mean "I"

From my perspective I am happy with the tradeoff of hanving gc pauses and not needing to manage memory manually.

kaba0 · on Dec 23, 2022

1) Java has the very best GCs out of anything to the point that I very much question anyone claiming to suffer from GC pauses on most workloads. Do you really have an application which suffers from that or is that just the usual “GC bad” mantra? Your average C program may spend just as much time trying to malloc a given block in its fragmented heap than that.

2) Why exactly? 3) that’s why you have a GC. But if I really want to understand what you mean (I assume you meant non-memory resources not being explicitly closed?), try-with-resources and cleaners solve the problem quite well.

4) Does it really matter if the last 2 decades were spent on optimizing allocation to the point where it is literally a pointer bump and like 3 basic, thread-local instructions? Deallocation can be amortized to be practically zero cost (moving GC), at RAM’s expense. Where it really really matters though, you could always do ByteBuffers, the new MemorySegment’s or just straight sun.misc.Unsafe pointer arithmetics. That string allocation will be escape analyzed and be stack allocated either way.

I don’t even know what do you mean by static allocation. You mean like in embedded, having fix sized arrays and exploding when the user enters a 32+1 letter text? I really don’t miss that. 6) value classes are coming and solving the issue. Though it begs the question, what’s the size of your average list? Also, how come it doesn’t matter for all the linked lists used extensively in C? Also, see point 4. 7) I don’t get what you mean here, do you mean not doing atomic instructions, because java don’t have out-of-thin-air values? There are rare programs that can get away with that, but I think Java has quite a great toolkit for synchronization primitives to build everything (most concurrency books use it for a reason).

bullen · on Dec 23, 2022

Yes, I mean having limits on everything! Every input has a max value. Players, messages count, message length, weapons they can hold etc. You can add more by recompiling, you should reuse them (for size but also to keep data contiguous) even if that is challenging under multicore utilization and you can add MUUUUUCH more than the CPU can handle to compute anyway so this is not an issue.

The first and last bottleneck of computers is and will always be RAM. Both speed, memory size and energy. A 256GB RAM stick uses 80W!!!! Latency is increasing since DDR3 (2007) and we have had caches to accommodate for slow RAM since 386 (1985) (3 always seems to be the last version, HL3 confirmed? >.<):

You need to cache align everything perfectly: 1) all data has to be in an array (or vector which is a managed array but i digress) 2) you need your types to be atomic so multiple cores can write to them at the same time without segmentation fault (int/float). 3) You need your groups/objects/structs to perfectly fill (padded) 64 bytes. Because then multiple cores cannot invalidate each others cache unless they are writing to the same struct.

So SoA vs. AoS never was an argument! AoS where structures are exactly 64 bytes is the only thing all programmers must do for eternity! This is the law of both X86 and ARM.

So an array of float Mat4x4 is perfect and I suspect that is where the 64 bytes came from. But here is another struct just as an example:

  struct Node {
    int mesh, skin;
    Vec3 spot, pace;
    Quat look, spin;
  };

kaba0 · on Dec 23, 2022

But things don’t fit into 64 bits all the time, and then you get tearing. This is observable and now you have to pay for “proper” synchronization. Also, apple’s m1’s reason for speed is pretty much bigger cache, so I don’t think it’s a good choice to go down this road.

Most applications have plenty of objects all around that are rarely used and are perfectly fine with being managed by the GC as is, and a tiny performance critical core where you might have to care a tiny bit about what gets allocated. This segment can be optimized other ways as well, without hurting the maintainability, speed of progress etc of the rest of the codebase.

bullen · on Dec 23, 2022

64 bytes, 512 bits.

Cachelines are 64 bytes on all modern hardware.

They will probably never change this value ever.

Everything fits into 64 bytes if you make the effort.

And if it doesn't you have to use two Arrays of 64 byte Structures and pad the last.

This is non negotiable and I'm completely baffled nobody has mentioned this yet.

I call this law: Ao64bS (did I invent my first law?) :D

kaba0 · on Dec 23, 2022

It may be a language barrier thingy, but then we are talking about different things. Also, that is architecture dependent.

bullen · on Dec 23, 2022

Nope on all modern CPUs (X86 and ARM) this is 64 bytes and has been since a looong time...

astrange · on Dec 23, 2022

Cache lines are 128 bytes on M1.

But since it’s AMP and not SMP, sharing work across cores doesn’t necessarily work how you expect it to.

bullen · on Dec 23, 2022

Can you ask the OS to give you a certain core type?

128 bytes is perfect 2 x 64! So even if the risk of cache invalidation goes up even if two cores are not writing to the exact same structure the alignment still works!

Good job Apple!

fisf · on Dec 23, 2022

There absolutely are modern systems that have e.g. 128 byte cache lines (M1).

pirocks · on Dec 24, 2022

> A 256GB RAM stick uses 80W!!!!

This seems very high to me, since most high capacity server sticks have no heatsink. Have you got a source?

lmm · on Dec 25, 2022

None of that answers my question. Why would you want your VM to be Java compatible if it can't run Java programs? What are you even going to run on this VM, given that the only mainstream application languages without automatic memory management at runtime are C++ and Rust, to the extent that the latter qualifies as mainstream?

bullen · on Dec 25, 2022

Javac is the point. To be able to code for my own engine without seg. faults. And to build something long term that improves slowly over time.

trimbo · on Dec 23, 2022

Why wouldn't you use C++?

bullen · on Dec 23, 2022

1) You cannot deploy C++ across different architectures/OSes without recompile.

2) VMs avoid crashes upon failures that cause a segmentation fault in native opcodes, with a VM you can keep the process from crashing AND get exact information where and how the problem occurred avoiding debug compiles with symbols and reproducing the error on your local computer.

Right now I use this with my C++ code to get somewhere near the feedback I get from Java (but it requires you to compile with debug): http://move.rupy.se/file/stack.txt

The question you really should ask is why are people using C++? Performance is only required in some parts of engines, to have a VM without GC on top should be default by now (50 years after C and 25 years after Java).

wtetzner · on Dec 24, 2022

Any reason not to do something like Rust + Wasm? Seems like it’d be a better fit.

I don’t have anything against Java, but it seems like you lose a lot of the benefits of using when you take away allocation.

bullen · on Dec 26, 2022

C + WASM maybe but then I'm pretty sure the compiler will not be as good as javac to tell me about things that are wrong.

Java + WASM is probably what I'll end up with, don't like the AoT step from a distance we'll see.

I'm looking at all Risc/stack op/byte-code, and it's disheartening in how many ways humans can copy the same thing differently.

C#, RISC-V, Java, WASM, ARM, 6502 ASM, uxn, lox the list goes on and on...

singularity2001 · on Dec 23, 2022

> Java programs assume GC

only users of long running programs assume GC, the program itself doesnt care.

marginalia_nu · on Dec 23, 2022

The programmer most likely assumes short-lived objects are "free" (as they effectively are in Java), and can be allocated in loops and so on without filling up memory and without incurring any real penalty to performance (via TLAB or possibly elision).

grashalm · on Dec 23, 2022

Nobody did that because java is a safe language by design. Manual allocation would make it an unsafe language. For the vast majority of applications correctness wins over performance.

Short lived objects are extremely cheap on the JVM with the right GC. Almost stack allocation cheap. So if you write code that avoids too many long lived objects there is almost no overhead to a GC.

For long lived objects GC based compaction can give you even some performance advantage over manual allocation due to better memory locality. But that heavily depends on the application of course.

In my experience people blame the GC way too early. Most often the application is just poorly written and some small tweaks can fix gc spikes.

bzzzt · on Dec 23, 2022

Because there's no need? There the 'epsilon' no-op GC in OpenJDK: https://openjdk.org/jeps/318

If you keep your object allocation in check you can use this to guarantee no GC pauses.

bullen · on Dec 23, 2022

Yes, but then you are downloading and running executables that contain the GC.

bzzzt · on Dec 23, 2022

The Java "GC" also contains the memory allocator and handling for out of memory conditions, so if you want to use Java there needs to be 'something' there to call.

karussell · on Dec 23, 2022

Isn't this already possible with -XX:+UseEpsilonGC? (disables the GC) And then you manage memory manually in big (primitive) arrays.

bzzzt · on Dec 23, 2022

Beat me to this answer ;) - I presume you can just allocate objects but you have to keep those allocations in check to prevent the JVM from terminating when going out of memory.

Maybe object pooling (which helped performance in old JVM's in the 90s) will make a comeback? ;)

lukeramsden · on Dec 23, 2022

Object pooling, mutable objects, flyweight encoding[0], and being allocation-free on the steady state are all alive and well in latency-sensitive areas like financial trading, plenty of which is written in Java

[0] https://github.com/real-logic/simple-binary-encoding/wiki/De...

bullen · on Dec 23, 2022

I'm going to try and prevent heap allocation in runtime. But since I'm going to use javac it's going to be ugly.

Basically only static atomic arrays (int/float) in classes will be allowed, for cache and parallelism, and AoS up to 64 bytes encouraged to avoid parallel cache invalidation.

And I'm even considering dropping float, and only have integer fixed point... but then I'll need to convert those in shaders as GPUs are hardcoded to float.

garblegarble · on Dec 23, 2022

Not to be too negative here but... why? You'd be creating something syntax-compatible with Java, but where you wouldn't be able to use any existing Java code & unable to use most of the interesting features of Java (even string concatenation is handled by instantiating a new StringBuilder on your behalf by javac). Aren't you just reinventing a worse C at that point?

bullen · on Dec 23, 2022

See my other 2 responses.

On mobile here would reference on PC.

usrusr · on Dec 23, 2022

What would be the point of that things relationship with java though? Is it somehow important that the bytecode run by a tiny vm that is apparently distributed alongside could also run on a proper JVM?

bullen · on Dec 23, 2022

Javac, I don't want to write a compiler.

The bytecode is only meant to run in my VM.

tryfinally · on Dec 23, 2022

Not many people are motivated to take optimization this far in the Java world.

On the .NET side, the language is more suitable because you can define your own value types (coming soon in Java AFAIK), and there is a lot of people using C# for game dev which is a big use case for GC optimization.

Unity has been integrating unmanaged allocators in their engine, which lets developers skip the GC much more easily (without having to manually preallocate and reuse memory, which isn’t anyone’s favorite workflow, and doesn’t save you much compared to a fast native allocator). I’ve also seen a couple of roughly equivalent projects, eg. Smmalloc-CSharp, github.com/alaisi/nalloc.

imhoguy · on Dec 23, 2022

You can check Java Card to see how GC-less low memory programing was sorted out there. https://en.m.wikipedia.org/wiki/Java_Card

jffhn · on Dec 23, 2022

>Why hasn't this been done by anyone yet?

I used to do Java apps which didn't allocate memory in the long run, and so did some banks for high frequency trading, by reusing objects on the spot or using pools for objects with a life cycle. It imposes some programming style/patterns, in particular for APIs design, but it's perfectly doable, so I guess people prefer to just do that. In what context that wouldn't be enough? For proof?

wtetzner · on Dec 23, 2022

At that point, why even use Java?

bullen · on Dec 23, 2022

See my other answers...

kasperni · on Dec 23, 2022

From another comment > Meaning you cannot type new in a method only in the class definition.

How would you handle the ~ 1000 java.x classes that even a minimal HelloWorld programs uses. There are plenty of random allocation.

bullen · on Dec 23, 2022

I'm not writing thus for old code, I'm writing it for new code. Specifically 3D action MMO scripting.

beardyw · on Dec 23, 2022

Not sure how that would play out. Just not running the GC is an unsatisfactory answer. Otherwise what?

bullen · on Dec 23, 2022

Basically you enforce static heap allocation. Meaning you cannot type new in a method only in the class definition. I don't even know if this can be enforced from the bytecode VM... just playing with it in my head.

pron · on Dec 23, 2022

That would violate the Java specification, i.e. it's not Java. On the other hand, what you could do is only allow allocations to succeed in class initialisation (Java's "static") and throw an error if done outside it. Note, however, that even adding an element to a Map outside of initialisers would fail when using OpenJDK's standard library even if both key and value have been preallocated in initialisers, as it may allocate an internal node and/or a new hash array. So you may want to change some of the standard library to make it more useful with your restrictions.

But you may want to ask yourself why you want to do that. OpenJDK's GCs have become really, really good in both throughput and latency, and the main thing you pay in exchange is memory footprint.

ako · on Dec 23, 2022

So the only place you really define what objects get created is in the class containing the main method? Seems fairly limited that you’d have to know upfront what your memory needs are? An app would have to allocate a large block of memory in the main class, and then handle memory management itself from this piece of memory?

bullen · on Dec 23, 2022

Yes, you would have to overallocate and reuse.

This is the way I code C now allready.

ako · on Dec 23, 2022

That would remove a lot of the benefits of using Java, might as well use something like c. Garbage collection has been highly optimized in the jvm, and most would consider it a benefit of a jvm compared to managing memory yourself.

the_only_law · on Dec 23, 2022

I’ve never understood it either. I’ve heard people doing similar things with Java in the low-latency trading area.

I mean Java isn’t the worst tool out there, but even C# outclasses it as a language. The biggest plus for Java is the massive, mature ecosystem which I imagine mostly evaporates when you’re only using a self restrictive subset of the language itself.

kaba0 · on Dec 23, 2022

It doesn’t evaporate, more often than not only a small subset of these trading programs need that strict “no heap allocation” policy. The rest of the program is free to take advantage of the huge ecosystem.

invalidname · on Dec 23, 2022

I have a bit of a different idea along somewhat different lines. Tweeted at you.

marginalia_nu · on Dec 23, 2022

Why do you want to avoid GC?

bullen · on Dec 23, 2022

See above.