None of the example languages (C,C++,java) have serious type systems (in the sense of mathematically well-behaved and expressive). I'll look at OCaml since that's the one i know best (and it's more low-level/perf-oriented than say Haskell's GHC imho).
Now what's a generic: it doesn't say much. A generic function might first be parametrized by a type, like map : (T -> U) -> array T -> array U. These "generics" are called parametrically polymorphic functions. We can prove that their (untyped) implementation is the same for any type: they will just pass opaque values around. These should be compilable to a single asm routine, as long as there is an homogeneous way to copy them (so either the opaque arguments need to be behind pointers, or we need the function to take the length of the runtime representation of that type so that it can memcpy it).
Now another type of "generic" are functions parametrized by a structure. These need their opaque arguments to satisfy some interface. The binding to the implementation of that interface can either be done by specializing the function to any given implementation, or by having vtables (or even by dynamic linking). The difference between this case and the polymorphic case is that here the underlying structure (eg "compare" or "insert") isn't done in a homogeneous way (single routine) for every type: the polymorphism is ad-hoc and not parametric. The vtable solution is a form of boxing and that impacts runtime, so you need to have good inlining passes (which enables "specialization"): indeed when you inline the box-taking function inside a context where the precise type of the (inside of the) box is statically known, you can constant-propagate (more generally: partialy-evaluate) the implementation of the structure functions: you locally unbox.
So in the end for efficient compilation of languages with generics you need to have
* an efficient compilation of parametrically polymorphic functions (by type erasure),
* a good run-time representation of interface implementations (modules, ie structs with functions and other stuff)
* and good inlining and partial-evaluation strategies for local unboxing.
There are other optimizations that can help but these usually need some help from the programmer/library, like eliminating high-order arguments by defunctionalization (any finite set of functions can be represented by a datatype and an "eval" function, like how closures are implemented in rust). But then again, this is dependent on strategic inlining: very generic high-order functions like container "fold" should almost always get inlined so that we can specialize them to their argument function and defunctionalize.
> These should be compilable to a single asm routine, as long as there is an homogeneous way to copy them
That's a massive asterisk. In practice, copying values to the heap is complicated. Integer values may be memcpyed. Pointer values may require read and/or write barriers, or perhaps reference counting. Compound types may require some combination. Maybe you have copy-constructors and you need arbitrary callouts! These implementation details can't be hand-waved away.
> The difference between this case and the polymorphic case is that here the underlying structure (eg "compare" or "insert") isn't done in a homogeneous way (single routine) for every type: the polymorphism is ad-hoc and not parametric. The vtable solution is a form of boxing and that impacts runtime
Kindly, I think you have some confused terminology. Parametric vs ad-hoc polymorphism is a difference in the type system, and occurs long before the optimizer kicks in. Parametric polymorphism is a single function implementation parametrized by types. Ad-hoc polymorphism is multiple implementations, and the compiler chooses one based on types (e.g. C++ overloading or template specialization).
If the optimizer chooses to emit a specialization, that does NOT make it ad-hoc polymorphism. That's just a detail of the optimizer. And it is a distinct optimization from inlining, for good reasons, like not wanting to be tied to the inlining cost model.
> for efficient compilation of languages with generics you need to have an efficient compilation of parametrically polymorphic functions (by type erasure)
C++ and Rust both have efficient generics without using type erasure. Boxed generics are not necessary at all, as long as you accept static linking.
You are certainly right in that observation. But there is slightly more to it. The "generic" concept from Java and C++ is more akin to the functors of the ML family. In that view, the question is not how to efficiently compile functions that are polymorphic (SML/NJ does that quite well, I think). The real question is, what are your compilation units.
If you optin to the OCaml view, then Modules (and Functors) have to be compiled separately. Everything else follows. Java looks at this in a similar fashion, btw.
If you optin to the C++ view, than only monomorphic code has ever to be compiled. In that scenario a template has no meaning on its own, except as an abstraction over code to be generated when needed.
In principle, SML/NJ shows how both views can be reconciled. One can easily envision separate compilation up to the point of specialization which is then done in a separate phase just before linking. I think no one does it that way, though, except maybe Swift.
I'm not sure i get what you mean by "hav[ing] to be compiled separately", but as far as i understand, the flambda intermediate representation of the ocaml backend has no (notable) distinction between a functor and a function (or a module and a record). Everything is untyped so it works. That's also how the runtime values are (which need to exist since there are first-class modules now). I believe that thanks to this ocaml does (could?) inline modules and functors. I'm not so sure about cross-file or cross-package inlining tho.
Now what's a generic: it doesn't say much. A generic function might first be parametrized by a type, like map : (T -> U) -> array T -> array U. These "generics" are called parametrically polymorphic functions. We can prove that their (untyped) implementation is the same for any type: they will just pass opaque values around. These should be compilable to a single asm routine, as long as there is an homogeneous way to copy them (so either the opaque arguments need to be behind pointers, or we need the function to take the length of the runtime representation of that type so that it can memcpy it).
Now another type of "generic" are functions parametrized by a structure. These need their opaque arguments to satisfy some interface. The binding to the implementation of that interface can either be done by specializing the function to any given implementation, or by having vtables (or even by dynamic linking). The difference between this case and the polymorphic case is that here the underlying structure (eg "compare" or "insert") isn't done in a homogeneous way (single routine) for every type: the polymorphism is ad-hoc and not parametric. The vtable solution is a form of boxing and that impacts runtime, so you need to have good inlining passes (which enables "specialization"): indeed when you inline the box-taking function inside a context where the precise type of the (inside of the) box is statically known, you can constant-propagate (more generally: partialy-evaluate) the implementation of the structure functions: you locally unbox.
So in the end for efficient compilation of languages with generics you need to have
* an efficient compilation of parametrically polymorphic functions (by type erasure),
* a good run-time representation of interface implementations (modules, ie structs with functions and other stuff)
* and good inlining and partial-evaluation strategies for local unboxing.
There are other optimizations that can help but these usually need some help from the programmer/library, like eliminating high-order arguments by defunctionalization (any finite set of functions can be represented by a datatype and an "eval" function, like how closures are implemented in rust). But then again, this is dependent on strategic inlining: very generic high-order functions like container "fold" should almost always get inlined so that we can specialize them to their argument function and defunctionalize.