The Generic Dilemma (2009)

okl · on March 21, 2021

As mentioned in the comments: "Maybe take a peek at Ada Generics. The generic instances are shared, and there is no boxing of types."

RM12: https://www.adaic.org/resources/add_content/standards/05rm/h...

RAT83: http://archive.adaic.com/standards/83rat/html/ratl-12.html

twic · on March 21, 2021

I think C# generics also work this way, at least to an extent. For example, a single instance of generic hashmap code can store any value as long as it's one word in size, which the most common sizes of integer, floating-point number, and pointer are.

EDIT: as many comments on that post say, sorry!

pansa2 · on March 21, 2021

> The generic dilemma is this: do you want slow programmers, slow compilers and bloated binaries, or slow execution times?

Now that the Go team has approved a design for generics, which of these did they choose? Or did they “somehow manage to avoid all three of these bad outcomes”?

ainar-g · on March 21, 2021

From what I could tell, they specifically worked out the current design in a way that allows them to change the implementation later. The current WIP code, as far as I know, mostly just compiles Generic Go code into the regular Go code, so closer to the “slow compilers” method, but the eventual goal seems to be some form of a hybrid approach[1].

EDIT: Here[2] is a direct link to what seems to be the most recent public DRAFT of the design document of the hybrid approach in question.

[1]: https://groups.google.com/g/golang-dev/c/OcW0ATRS4oM

[2]: https://go.googlesource.com/proposal/+/78bd52518d53994d3fe66...

millstone · on March 22, 2021

Swift has tried to address this by making specialization an optimization. When it compiles a generic function, it emits a single boxed implementation, like Java. But the optimizer may also choose to specialize it for concrete types, like C++ or Rust does.

It sounds nice, but there's a huge performance difference between boxed and specialized. When you make a function generic, the compiler boxes values, conservatively emit retain/release, others.

The difference is so large, it justifies giving the programmer levers to control specialization. Swift team has added "@_specialize" as a hint. I think they should go further.

What I don't know is how specialized generics can participate in library evolution. If a dynamic library provides a sort routine, and you specialize it, what happens when the library changes? This is where a JIT or the .NET model (compile at install time) can pay off. Apple has laid some groundwork for this (bitcode, Rosetta 2) so they can certainly get there if they choose to.

Anyways I think the real dilemma is whether generics are meant to be a performance optimization (C++, Rust) or just for typechecking (Java, C#). The cost of saying yes to performance is library evolution.

lapinot · on March 21, 2021

None of the example languages (C,C++,java) have serious type systems (in the sense of mathematically well-behaved and expressive). I'll look at OCaml since that's the one i know best (and it's more low-level/perf-oriented than say Haskell's GHC imho).

Now what's a generic: it doesn't say much. A generic function might first be parametrized by a type, like map : (T -> U) -> array T -> array U. These "generics" are called parametrically polymorphic functions. We can prove that their (untyped) implementation is the same for any type: they will just pass opaque values around. These should be compilable to a single asm routine, as long as there is an homogeneous way to copy them (so either the opaque arguments need to be behind pointers, or we need the function to take the length of the runtime representation of that type so that it can memcpy it).

Now another type of "generic" are functions parametrized by a structure. These need their opaque arguments to satisfy some interface. The binding to the implementation of that interface can either be done by specializing the function to any given implementation, or by having vtables (or even by dynamic linking). The difference between this case and the polymorphic case is that here the underlying structure (eg "compare" or "insert") isn't done in a homogeneous way (single routine) for every type: the polymorphism is ad-hoc and not parametric. The vtable solution is a form of boxing and that impacts runtime, so you need to have good inlining passes (which enables "specialization"): indeed when you inline the box-taking function inside a context where the precise type of the (inside of the) box is statically known, you can constant-propagate (more generally: partialy-evaluate) the implementation of the structure functions: you locally unbox.

So in the end for efficient compilation of languages with generics you need to have

* an efficient compilation of parametrically polymorphic functions (by type erasure),

* a good run-time representation of interface implementations (modules, ie structs with functions and other stuff)

* and good inlining and partial-evaluation strategies for local unboxing.

There are other optimizations that can help but these usually need some help from the programmer/library, like eliminating high-order arguments by defunctionalization (any finite set of functions can be represented by a datatype and an "eval" function, like how closures are implemented in rust). But then again, this is dependent on strategic inlining: very generic high-order functions like container "fold" should almost always get inlined so that we can specialize them to their argument function and defunctionalize.

millstone · on March 22, 2021

> These should be compilable to a single asm routine, as long as there is an homogeneous way to copy them

That's a massive asterisk. In practice, copying values to the heap is complicated. Integer values may be memcpyed. Pointer values may require read and/or write barriers, or perhaps reference counting. Compound types may require some combination. Maybe you have copy-constructors and you need arbitrary callouts! These implementation details can't be hand-waved away.

> The difference between this case and the polymorphic case is that here the underlying structure (eg "compare" or "insert") isn't done in a homogeneous way (single routine) for every type: the polymorphism is ad-hoc and not parametric. The vtable solution is a form of boxing and that impacts runtime

Kindly, I think you have some confused terminology. Parametric vs ad-hoc polymorphism is a difference in the type system, and occurs long before the optimizer kicks in. Parametric polymorphism is a single function implementation parametrized by types. Ad-hoc polymorphism is multiple implementations, and the compiler chooses one based on types (e.g. C++ overloading or template specialization).

If the optimizer chooses to emit a specialization, that does NOT make it ad-hoc polymorphism. That's just a detail of the optimizer. And it is a distinct optimization from inlining, for good reasons, like not wanting to be tied to the inlining cost model.

> for efficient compilation of languages with generics you need to have an efficient compilation of parametrically polymorphic functions (by type erasure)

C++ and Rust both have efficient generics without using type erasure. Boxed generics are not necessary at all, as long as you accept static linking.

choeger · on March 21, 2021

You are certainly right in that observation. But there is slightly more to it. The "generic" concept from Java and C++ is more akin to the functors of the ML family. In that view, the question is not how to efficiently compile functions that are polymorphic (SML/NJ does that quite well, I think). The real question is, what are your compilation units.

If you optin to the OCaml view, then Modules (and Functors) have to be compiled separately. Everything else follows. Java looks at this in a similar fashion, btw.

If you optin to the C++ view, than only monomorphic code has ever to be compiled. In that scenario a template has no meaning on its own, except as an abstraction over code to be generated when needed.

In principle, SML/NJ shows how both views can be reconciled. One can easily envision separate compilation up to the point of specialization which is then done in a separate phase just before linking. I think no one does it that way, though, except maybe Swift.

lapinot · on March 21, 2021

I'm not sure i get what you mean by "hav[ing] to be compiled separately", but as far as i understand, the flambda intermediate representation of the ocaml backend has no (notable) distinction between a functor and a function (or a module and a record). Everything is untyped so it works. That's also how the runtime values are (which need to exist since there are first-class modules now). I believe that thanks to this ocaml does (could?) inline modules and functors. I'm not so sure about cross-file or cross-package inlining tho.

lapinot · on March 21, 2021

See flambda, the recent evolution of the ocaml inliner.

https://caml.inria.fr/pub/docs/manual-ocaml/flambda.html

https://blog.janestreet.com/flambda/

pif · on March 21, 2021

I have never understood this fetish for short compilation times.

Compilation time is an extremely good investment if it lets me express exactly what I need and gives me the fastest possible program as a result.

enqk · on March 21, 2021

When building programs with user interfaces*, there are parts that _cannot_ be tested formally with either types or automated tests, because they require a human to test how things "feel."

In that context, long compile times lengthen the build-evaluate feedback loop to a point that harms quality.

* also note that APIs are user interfaces. So this harms way more than just programs with GUIs.

rzk · on March 21, 2021

Same for data analysis. It is mostly exploratory and iterative work, with the need for a feeback loop (making plots, checking for unusual values, etc.) because the input data is unpredictable.

That's why data scientists prefer interpreted languages and Jupyter Notebooks.

pif · on March 24, 2021

That's also why you get the -O0 and -O3 options.

aflag · on March 21, 2021

Long compilation time can hurt developer productivity as they need to wait longer to test the code they just wrote. It doesn't slow the developer as much as having to write the code themselves, but it's still slower than paying the cost at the execution time (in many cases, anyway).

blandflakes · on March 21, 2021

I generally am very willing to trade compile-time speed for safety or runtime speed. One case I can think of where I wouldn't though is experimentation, where I need to experiment with a built artifact. For example, sometimes I want to run code in a dev environment to test it without building and running an entire microservice graph, and there I feel Scala's slow compiletime rather acutely (even when skipping other steps like checkstyele, unit tests, and integration tests).

lapinot · on March 21, 2021

This. Big part of the most high-performance code (like crypto, linear-algebra or fft kernels) are even generated: you build a program that will generate a program that will get compiled (or is already asm). One can arguably say that the generator program is part of the compiler chain (takes parameters describing some algorithm and generates executable code). And these generators sometimes call out to SMT solvers or otherwise solve complicated algorithmic problems (integer programming problems for vectorizing with the polyhedral model). Hence these are specialized high-level compilers that take a long time to output extremly fast and small programs. Most people don't call the generators tho, they only use the generated code: long compilation time is no problem if you have good modular compilation.

http://www.fftw.org/

http://flopoco.gforge.inria.fr/

https://github.com/FStarLang/kremlin

https://github.com/project-everest/vale

overgard · on March 21, 2021

Quick iteration times make debugging and experimenting much easier. I spend much more time debugging and experimenting then I do optimizing or making high level designs.

pif · on March 24, 2021

Ever heard about -O0 and -O3 ??