Only if you program with atoms. If you program with arrays then it is my experie...

pjc50 · on July 24, 2015

I don't suppose you could elaborate on what you mean by atoms vs arrays here? I've not heard anyone say "atom" outside of a LISP context.

A fairly random example of OO "business logic": https://git.eclipse.org/c/b3/b3.git/tree/org.eclipse.b3.aggr...

(I went clicking through Eclipse because I knew I could count on it to have lots and lots of this kind of abstraction-heavy bland Java. I'm not even saying this particular bit of code is slow, it's just that there's a lot of OO code that looks vaguely like this.)

vvanders · on July 24, 2015

I believe what he's referring to is also called AoS(Arrays of Structures) vs SoA(Structure of Arrays).

If you're iterating over one or two values in a struct via AoS it's very painful from a memory caching standpoint since you're only getting sizeof(member) / sizeof(struct) efficiency.

In a SoA situation all your data for members is tightly packed so you usually get better cache coherency.

That said the best approach is actually looking at your data access patterns and packing your data appropriately. Unfortunately some languages don't have value types(most managed languages except C#). This is why C/C++ is usually faster than managed languages, they can get close on generated code but can be off by a factor of 50x in memory access performance.

pron · on July 24, 2015

> This is why C/C++ is usually faster than managed languages, they can get close on generated code but can be off by a factor of 50x in memory access performance.

This is a common misconception that is a result of Java's current state. Control over memory layout and garbage collection are two completely orthogonal issues. At the moment, Java just happens to be both GCed and to afford little control over memory layout. Currently, the HotSpot team (HotSpot is the name of OpenJDK's JVM) is working on project Valhalla and project Panama, two efforts scheduled for Java 10 and intended to give the JVM all the memory layout control you need.

vvanders · on July 24, 2015

Not completely.

Most GCs want to be able to compact the heap, that means they lean in favor of reference based models by default.

It's not exclusive like you mention but to my knowledge C# is the only language with explicit guarantees. I was aware of the effort around Java's value types, however how long till we can reasonably see this in production?

You can do this in Java today but it involves lots of nasty ByteBuffer manipulation(in which FlatBuffers is fantastic for). You'll still pay for the conversions from bytes to the actual types you want.

If you're dealing with these types of performance problems it's best to treat them with a language that's well suited to deal with them. There's nothing wrong with using Java for higher level business logic and delegating the heavy lifting to a stack that's designed to deal with them.

pron · on July 24, 2015

> Most GCs want to be able to compact the heap, that means they lean in favor of reference based models by default.

I don't see how this follows. Copying GCs do compact the heap, but when you use values you're basically saying "these two things go together". You're only making life easier for the GC (well, you're also creating larger objects, but that might just require a small adjustment of the GC strategy).

> how long till we can reasonably see this in production?

Four years probably... Still, that doesn't change the fact that layout and GCs are orthogonal.

vvanders · on July 24, 2015

Expanding on my point a bit, I'm not arguing that in theory GC and memory layout are orthogonal.

Just that in practice you don't see memory layout in languages that are managed. Since I'm usually not in a habit of building a new languages to solve problems it's something worth understanding when making a technical selection on a language.

pjmlp · on July 24, 2015

D, Modula-3 and the Oberon family are two examples of value types by default with a GC enabled heap.

Eiffel also offers value types (aka expanded), but it is reference by default.

pjmlp · on July 24, 2015

> This is why C/C++ is usually faster than managed languages, they can get close on generated code but can be off by a factor of 50x in memory access performance.

This only happens because the Pascal branch of languages with GC, sadly failed in the mainstream (for several reasons).

Modula-3, Eiffel, Oberon, Oberon-2, Component Pascal all offer the same data access patterns and packing available to C and C++.

geocar · on July 25, 2015

Many programming languages put boxes around the bits that the CPU actually operate on. These boxes often store things like reference count and type (or a dispatch table for object-oriented languages with single-dispatch). That thing inside a box that is the actual value is called an atom.

Those boxes have a lot of overhead (in PHP it's around 140 bytes!), and because a major identified problem is the waste of valuable cache space and CPU time book-keeping these boxes, a lot of JIT and compiler research is about how to eliminate the boxes.

You can see why they do this when considering a program like:

    $a = array();
    for($i = 0; $i < 100000; ++$i) $a[$i] = rand()*65536;
    sort($a);

which might take 200k (and fits easily in cache) in C/C++ but 15 megabytes (and doesn't) in PHP.

However.

Eliminating those boxes has proven to be very difficult, and while I have heard great things about Java 10, I am reserved because I also heard great things about Java 5. And Self. I do not expect JIT/compilers will get smart enough to actually solve this problem in my lifetime.

On the other hand, array programming languages allow you to get that space-overhead that C/C++ has in an interactive (and usually interpreted) language, so it so happens that you don't have to choose between a nice interpreted programming language with introspection, automatic memory management, and so on, and high performance, but it does mean you have to program the arrays instead of the atoms.

I do not understand how that source file is "OO business logic".

vvanders · on July 24, 2015

Take a look a Mike Acton's talks, he does a pretty good job of breaking down why OOP just isn't built for high performance(arranging data by logical association rather than data access patterns).

Yeah, your metapoint is is worthwhile, all the JIT in the world aint gonna help you if you're cache missing every damn fetch(hello Java).

corysama · on July 24, 2015

> all the JIT in the world aint gonna help you if you're cache missing every damn fetch(hello Java).

Acton (sort of) has a talk about that too. Except he speaks about traditional, ahead-of-time compilers. Tldr is that because of how the vast majority of programmers arrange their data, CPUs spends the vast majority of their time stalling on memory. Therefore improvements in optimizing compilers can only marginally help the marginally small section of running time spent actually exercising the ALUs. The solution is to devote more effort to changing our habits regarding data layouts.

vvanders · on July 24, 2015

Yup, shaving 5-10 cycles from compiler optimizations doesn't help you when you've got a deficit of 200+ cycles per cache miss.

Working with FPGAs and DRAM really makes a lot of reasons behind this and lot of other "obscure" performance issues like pipelining crystal clear.

wiml · on July 24, 2015

OOP encourages this kind of memory layout, but I don't think it requires it. I don't think there's anything preventing, say, a JVM or a CLOS implementation from storing "hot" members/fields in compact homogeneous arrays and "cold" ones in more traditional heap-allocated structures, perhaps with the help of a few programmer annotations. Or a tracing VM cooperating with a copying GC to perform this kind of optimization on the fly. This seems like it could make an interesting CS research project, in the (unlikely) case that it isn't one already.

corysama · on July 24, 2015

That is an interesting feature in Jon Blow's "Jai" language project. Despite being a C-like language, it sets you up to be able to move members between hot and cold storage easily without modifying existing code that references those members.

You can effectively define a struct as references to separate sub-structs in separate arrays. Member access syntax can be flattened so that it doesn't matter in the syntax which sub-struct a particular field is in. Realize that you aren't using a field as often as you expected? Move it's declaration from the hot sub-struct to the cold one and recompile. That will move all instances of that field from the hot array to the cold array, but no other code needs to be edited.