Hacker News new | past | comments | ask | show | jobs | submit login
How the Go runtime implements maps efficiently without generics (cheney.net)
197 points by kjeetgill on Sept 5, 2018 | hide | past | favorite | 86 comments



If you're familiar with how go does interfaces it's kind-of reminiscent of how they do maps. A variable that is of an interface type has 2 parts: And pointer to the Itable mapping functions in the interface to their implementations in hard type, and a pointer to data of that instance of the hard type. The creation and assignment of that Itable is invisible at user level and baked in at compile time (with reasonable but irritating consequences like type nils).

Similar to an Itable, they're injecting a type descriptor table (maptype) in the function parameter that can get things like the size of a key or value, alignment, kind, etc.

I had two questions: A) why can't hmap struct keep a pointer to the maptype? it shouldn't vary per callsite right? B) does this count as generics?

It's strange. Like a half erased, half reified generics. It seems erased in that the hmap itself doesn't carry the type information but it's reified at each callsite?


>B) does this count as generics?

I think "generic programming" is more about generic algorithms than it is about data structures. Because without generic functions, generic data structures are less useful.

Can you create a new map type in Go? Yes. Can you create a function that works on any map? Only with reflection, which you should consider your last measure.


> Can you create a function that works on any map? Only with reflection

Most experienced go developers at this point would template the function and generate it for any type they might need it for, before they reach for reflection.


Depends on the environment, I guess. I've been programming Go professionally for over 3 years, and I've never used such "templates" for generic types.

As I've said in another thread, I've had a need for sum types much more often than a need for generics.


> template the function and generate it for any type they might need it for

How? This is C++ generics. I thought Go didn't have generics?


I think they meant literal generation of files based on templates. Basically

  sed 's/T/int/g' file@T.go > file@int.go
Some people do that, I guess.



little more 'go' than that, using go generate and the template package. There are also other generators that can generate a 'Stringer' for any type, etc.


The article explains how you can get arbitrarily typed values out of the map using unsafe pointers, but it never really explains how they're put in.

The insert example in the article uses a function that doesn't actually exist in the runtime package, and it implies you can just give it a plain value as input:

  m["key"] = 9001   → runtime.mapinsert(m, ”key", 9001)
I would assume that is only possible if the 9001 value can make its way through as an actual value on the stack, not as a pointer. Here's what I would expect to actually happen:

  ptr := runtime.mapallocate(m, "key")  // unsafe.Pointer
  *(*int)(ptr) = 9001  // Is there a non-typed way to do this?
While this is a good intro to hash tables in general, the Go specific part feels a little bit lacking.


my guess is it allocates the value onto the stack and then inside mapinsert it knows to memcpy the value into the bucket because it knows the size of the value.

like

    tmp = 9001
    runtime.mapinsert(m, "key", &tmp)


Yup, that looks simpler than what I put in my comment. Do you know where the actual implementation is so we can confirm?


Nah. I just decompiled m["foo"] = 5 and it is closer to what is in your comment than mine.

so for this it does:

    push addr of string
    push $3
    call runtime.mapassign_faststr
    mov $5, 0($ax)

    test.go:4             0x104b4cb               488d05f9c20100          LEAQ go.string.*+91(SB), AX
    test.go:4             0x104b4d2               4889442410              MOVQ AX, 0x10(SP)
    test.go:4             0x104b4d7               48c744241803000000      MOVQ $0x3, 0x18(SP)
    test.go:4             0x104b4e0               e8cbc6fbff              CALL runtime.mapassign_faststr(SB)
    test.go:4             0x104b4e5               488b442420              MOVQ 0x20(SP), AX
    test.go:4             0x104b4ea               48c70005000000          MOVQ $0x5, 0(AX)
so its kind of like:

    ptr = runtime.mapassign_faststr("foo")
    *ptr = 5

EDIT: oops confused 0x3 with the value the first time around


Thanks! That clarifies a few things. Too bad this isn't covered in the blog post which was specifically about how Go implements maps efficiently. Here are two interesting things the post doesn't cover:

1) How the insert works and why it's this way instead of any other;

2) Apparently the runtime has specialized implementations to insert values for some key types (string in this case), which implies it does in fact have multiple implementations of the "same" code.

Edit: I'm also curious how plain Go code can assign a value directly to an unsafe pointer without expressing the type? Or is that made unnecessary because the compiler rewrites the `myMap["key"] = value` step and has free reign to output unsafe, untyped Go code that isn't normally valid?


yeah. my guess is the compiler just outputs its intermediate format when it sees map assignment and can just do whatever it wants.

for reference you can play around with this by doing:

    go build test.go
    go tool objdump ./test
then search for test.go in the output and you should be able to find the go 'assembly' for your functions. also, if you use a big enough struct type then go will call into a memcpy() style routine to do the assignment.

    test.go:32            0x108d791               488d15d7170300          LEAQ go.string.*+255(SB), DX
    test.go:32            0x108d798               4889542410              MOVQ DX, 0x10(SP)
    test.go:32            0x108d79d               48c744241803000000      MOVQ $0x3, 0x18(SP)
    test.go:32            0x108d7a6               e835c2f7ff              CALL runtime.mapassign_faststr(SB)
    test.go:32            0x108d7ab               488b7c2420              MOVQ 0x20(SP), DI
    test.go:32            0x108d7b0               488d742430              LEAQ 0x30(SP), SI
    test.go:32            0x108d7b5               48896c24f0              MOVQ BP, -0x10(SP)
    test.go:32            0x108d7ba               488d6c24f0              LEAQ -0x10(SP), BP
    test.go:32            0x108d7bf               e83807fcff              CALL 0x104defc
0x104defc for me looks like it duffcopy from this file: https://golang.org/src/runtime/duff_amd64.s


Go hashmaps are generics, or at least parameterized types. You can't create new parameterized types in Go, but you get maps and channels built in.


By that definition C89 has generics too (arrays and pointers). The whole point of generics is that they are user definable. Everything else is hairsplitting.


I think it's relevant in the sense that it's not particularly impressive once you know the answer. Basically the answer to the title "How the Go runtime implements maps efficiently (without generics)" is "by using compiler magic".

Furthermore the article presents Go's approach superior to C++ because of the reduced code generation (since it uses a "maptype" struct for the type-specific information) but doesn't talk about the drawback of such an approach: you have additional indirections in the mapping code and the compiler can't inline and optimize for specific types. Of course in some cases that's actually desirable, very large code size can be problematic.

I could also add that you could pretty easily implement Go-style hashes in C++ (actually that's often how you implement generic hashes in C, using helper functions depending on the type) but you can't implement C++-style hashes in Go without hacking the compiler or using codegen.


C11 got user defined generics, FYI.

https://en.cppreference.com/w/c/language/generic


This is something else. C++ has had parametric polymorphism in the form of template metaprogramming for a long time (the article on hashmaps even covers it).

This is one of the reasons I don't really like the name 'generics', it's a little ambiguous.


It was the term coined by CLU, Ada, Modula-3, so it is already widely used in the academia.


Yes, exactly correct. But nonetheless the way Go makes maps efficient is it has generics for maps.


and arrays and slices and pointers.

> You can't create new parameterized types in Go

Or create functions on parameterized types (without providing concrete type parameters), which is possibly a bigger annoyance than the other one: slices are a thing, but you can't operate on arbitrary slices without reflection. As well as generic types being restricted to builtins, generic functions (e.g. close, copy, len) also are.


We're provided with some nice fake type signatures for such builtin functions in the godoc:

https://golang.org/pkg/builtin/#copy

https://golang.org/pkg/builtin/#Type


If Go 2 is bringing us generics, then couldn't that `builtin` package provide real type signatures for the builtin functions instead of fake ones? Instead of

  func delete(m map[Type]Type1, key Type)
it could have

  func Delete(type Type, Type1 somecontract) (m map[Type]Type1, key Type)
in there, and a program compiled with Go 2 would have an implied instance of

  func delete = builtin.Delete(T, U)
in its global namespace for every map with a `delete` used somewhere in the code.


Yeah, for me this is the big problem. For example, I once built a generic flow based library in Go for work (Go was not my choice, and while I like Go it was not really the best choice for this library). The idea is to have a pipeline of go routines where data flows from one stage in the pipeline to the next. Channels and go routines are awesome for this, but setting up all the types for the channels based on the types in the functions being run by the go routines was quite difficult. I had to do it all by reflection and the code was really complicated. With generics, it would have been trivial.

The proposal in the 2.0 spec looks promising and I'm looking forward to hopefully being able to play with an implementation sometime.


> setting up all the types for the channels based on the types in the functions being run by the go routines was quite difficult. I had to do it all by reflection and the code was really complicated.

My guess (and personal experience) is that this would maybe be more convenient to do this via code generation. It's not as ergonomic as generics (well, depending on the implementation...), but still it's less cumbersome and prone to failing at runtime that reflection.

Reflection has its uses and places, but as soon as it gets all over a function then it's time (imho) to look into code generation.


I definitely would have done it that way, but there was no standard code generation in go at the time. I could have whipped something up it's hard to say which would have been more complex.


As an anecdote: my 'code generation' go-to solution in go is text/template's based. The '//go generate' on the first line of the source file generating the final code is a mere helper and I could do without it any time. Again that's just my experience ;)


Don't forget slices and arrays


How can the author claim that "Just like C++ and just like Java, Go’s hashmap written in Go" (sic) when there is clearly a whole bunch of compiler magic going on that goes beyond plain Go?


The implementation is in Go, compiled from the source of the runtime library when that was built, but the syntactic desugaring of your map-accessing code is happening in the compiler.


That's still a bunch of magic that C++ and Java don't really get to use though, isn't it?

Or would you consider this to be on the same level of adding opcodes to the JVM?


I'd consider it fairly similar to the distinction between Java and the JVM, yes, in that the JVM technically has more power than the Java language exposes. There's multiple levels, with different powers.

You're not going to be able to draw a sharp line here where Go is somehow on one side of it and most other languages aren't. In a sense the entire point of a "language" in the first place is to provide various bits of "magic" to make your life easier. For instance, you can't implement your own exceptions system, because the compiler hooks in at a deeper level than you can get to. (In both cases you can probably do it via what are technically platform-specific hacks, but I wouldn't say that you've implemented it in the language.) The only language with "no magic" is pure assembler, with either no macros or at least no predefined macros.


Actually even Assembly has magic if the CPU is microcoded.


Sure, the desugaring is special treatment, compiler "magic" that is not available at all to other libraries that might like to similarly hook into the map syntax. I was just trying to explain how this does not have to conflict with the claim that the map implementation (the runtime library functions that are called by the desugared version) is plain, regular Go. You could rewrite the map implementation, you just would not get the nice syntax.

Compared to Java, it's somewhere in the middle between the regular, entirely "un-magic" implementation of maps and the deeply built in nature of the array primitive, which is not only special at compile time (like Go maps) but also at runtime.


I really don't understand why go doesn't have generics. Shouldn't it be even easier because go has no inheritance? So the problem of covariance and contravariance goes away.

Seems really strange.


There are proposals to add generics soon: https://go.googlesource.com/proposal/+/master/design/go2draf...

It's not a question of whether or not it's easy or hard; it's definitely technically doable.

It's a question whether or not it's desirable. I think the consensus is shifting decisively to saying that it _is_ desirable. The question then becomes how you can assert concepts on a type at compile time.

E.g. for a function add(a T, b T) and a type T, how do we assert that a + b is a valid statement? The draft design shows one approach to this.


> It's not a question of whether or not it's easy or hard; it's definitely technically doable. It's a question whether or not it's desirable.

And everyone not trying to make excuses for the Go-team knows what the one and only answer to that question is.


Rob Pike wrote a bit about it in "less is exponentially more". He points out that generics are used by languages that focus on types and that Go focuses on composition and uses interfaces ( and language buildins ) instead of types to solve problems.


> generics are used by languages that focus on types and that Go focuses on composition and uses interfaces ( and language buildins ) instead of types to solve problems.

Interfaces are (abstract) types, so you can't “use interfaces instead of types”. Go’s use of interfaces is a variation of the “you can only have subtypes of abstract types” approach. Julia is an example of another modern language which also uses this general approach to type heirarchy, which, yes, does correspond to a strong preference for composition over inheritance; OTOH, like most relatively modern non-Go language with this approach, it also has generics. The idea tha generics are mostly used by languages that don't follow this approach is true in the sense that most languages, and therefore most with generics, do not follow this approach. But modern languages besides Go that follow the “composition-over-inheritance, no subtyping concrete types” approach seem to be just as likely as languages which support subtyping concrete types to have generics.

So, as well as not actually explaining anything (generics and composition-over-inheritance don't solve the same problem, so using the latter doesn't explain not using the former even if they did tend coincide in practice), it's doesn't seem to point to a real correlation.


Composition is orthogonal to generics.


>Rob Pike wrote a bit about it in "less is exponentially more".

Which is wrong.

>Go focuses on composition

Clearly false.


How is this false? I agree with this assertion after having used Go extensively for some time. Where Go really shines is when the stdlib offers a nice abstraction (io.Reader, io.Writer, http.Handler) and you can compose your programs from building blocks that use these abstractions. For example, from the http.Handler interface follows directly a middleware interface:

  type Middleware interface {
    Wrap(http.Handler) http.Handler
  }
without requiring any generic types. For comparison, the same thing in Rust would rely on type polymorphism:

  //assuming that Handler is a trait
  trait Middleware<T: Handler> {
    type Result: Handler;
    fn Wrap(handler: T) -> Result;
  }
  //NOTE: Not using impl-trait here to make it clear that
  //two type variables are involved (one type parameter
  //and one associated type).
You could mimic the Go behavior if you box all your traits, which Rust does not like to do because it's not zero-cost:

  trait Middleware {
    fn Wrap(handler: Box<Handler>) -> Box<Handler>;
  }


The current generics proposal includes some form of ad-hoc polymorphism (overloading) on top of standard parametric polymorphism, which complicates things quite a lot.

Also, choosing which compilation strategy to use has a big impact on the compiler architecture (monomorphization? boxing? a mix of both?)


Time and priorities. They didn't have the time until now as their priorities were elsewhere (as they should have been, they were building tools for their own use cases).

You can argue until the heat death of the universe about wether those priorities were chosen wisely, but it's irrelevant now. They built the tools for themselves focusing on what they found to be most important at the time. It's clearly worked out well enough for them and many of us like how Go turned out despite its short comings.


I think the article answers that. There's a whole bunch of design and implementation decisions that would be very different and more complex with generics.


This is really not different from how generics are implemented in many languages, it's almost identical to how haskell implements parametric functions w/ class constraints (i.e. a dictionary w/ concrete functions for implementing the algorithm for a specific set of type parameters is passed): https://www.schoolofhaskell.com/user/jfischoff/instances-and...

The only difference is that haskell generic functions can be specialized and the methods in the dictionary can be inlined: https://wiki.haskell.org/Inlining_and_Specialisation#What_is...


So basically they invented a runtime type system. Again.


Yeah, a callback approach like this is actually one of common strategies for data structures implemented in C. Others include a requirement that both the key and value should be byte sequences or equivalent (you can always `memcpy` primitives from/to byte sequences).


I think there's some contradiction between those statements:

> Rather than having, as C++ has, a complete map implementation for each unique map declaration, the Go compiler creates a maptype during compilation and uses that value when calling into the runtime’s map functions.

> Does the compiler use code generation?

> No, there is only one copy of the map implementation in a Go binary

The compiler certainly does code generation, as outlined in the article. It just minimizes the amount of generated code, and uses explicit non-generic functions on some higher layers. One can do the same in a generic C++ map implementation too, by delegating some of the logic into non-generic functions. It's a thing that is commonly done in manually size-optimized code. One could also hope that the compiler automatically detects some of the duplicated code between template instances, and deduplicates it.


The nice thing about C++ and templates is that you can implement a scheme exactly like this without special help from the compiler :)


The claim is that Go maps are "very fast" but that's relative to what you're trying to do. If you're writing an interpreter loop, you want to avoid maps as much as you can and use arrays instead.


Indeed. Go maps are slow as hell. Everytime I need them in a tight loop, I have to make an ad-hoc equivalent with slices, if at all possible.


I believe it's because map access are not inlinined in Go.

Swift uses a similar strategy as Go to implement dictionary (a runtime data structure that describe data access is sent as a supplementary parameter) but the backend of Swift is able to inline most of the calls. So at some point in the future, Go maps may be faster.


(This is a bit off-topic, but I couldn't help it.)

I noticed the article refers to "hashmap" as a data structure. Such as this statement:

> Go maps are hashmaps

I think instead of hashmap, this article could use "hash table", which is the actual data structure. The term hashmap sounds almost Java-specific to me (but I understand it helps distinguish when you have both TreeMap and HashMap in a language, like Java).

This is a cool article regardless, giving a sneak peek into the internals of Go map type.


Didn't sound that Java-specific to me:

https://doc.rust-lang.org/std/collections/struct.HashMap.htm...

http://hackage.haskell.org/package/hashmap-1.3.3/docs/Data-H...

https://stdlib.ponylang.org/collections-HashMap/

https://api.dartlang.org/stable/1.24.3/dart-collection/HashM...

https://pocoproject.org/docs/Poco.HashMap.html

For me, HashMap is an abstract data type that adheres to the Map interface and is implemented with (i.e. has the performance characteristics of) a hash table. It's a subset of Map, hence the "go's maps are hashmaps" remark.

HashSet is often implemented with hash tables too. I guess the difference is ADT vs data structure (and I see your point there), not Java-specificity.


The dict/hash implementation in K (since 1993 or so ..) is much simpler, cleaner, takes much less memory and often faster - though speed depends on the actual use case more than anything.

The idea is that instead of the hash map being a list of <key,value> items indexed by key, there's a vector of keys, a vector of values (with matching indices), and a search index (usually a hash, but could be a tree) that maps a key to the index; note that the search index does not need a copy of the key even if it is inexact because one can always refer to the keys vector.

The cons:

* You can't pass a single <key,value> object without building one

* You may incur twice or thrice cache loads (one for index, one for key, and if correct, one for value) whereas on a conventional <key, value> list, they are likely to be on the same cache line.

* (edit: added based on aidenn0 comment, see discussion below): Removal is generally O(n) unless you are willing to give up one of the pros; but even if you give some up, it's still usually more featureful than commonly used implementations.

The pros:

* Insertion order is preserved - OrderedDict for free!

* Getting the list of keys, or the list of values, e.g. python's keys() and values() methods, are O(1) and allocation free

* Applying a function to each item needs to allocate a new value vector, but can reuse the key vector -- and if the function doesn't need the key, you don't actually have to fetch it - e.g., to normalize a map so values sum to 1, you can do:

    whole = sum(mymap.values())
    normalized = apply_values(mymap, lambda x: x/whole) 
And get a normalized map with the same keys and literally the minimal number of allocations and operations possible for a concrete datatype

* Specialization is by the key type, regardless of the value type - which means that a few efficient implementations (for ints, longs, floats, strings, symbols) manage to often provide C++ speeds for an interpreted language.

* Memory packing is about as tight as it can be. Your key is a double, and your value is one byte? You pay exactly 9 bytes per record + index size (which, e.g. for a 30,000 record map, takes two bytes per hash slot, or ~128KB for a 64K slot table). This is true whether you are running on an 16-bit, 32-bit, 64-bit or 128-bit processor.

* It is straightforward and natural to implement with almost no pointers involved, meaning that it is trivial to share memory among processes (at least for read only accesses) and to use with memory-mapped storage -- to the point that you can have an 8GB dictionary which is instantly mmap()/MapViewOfFile()d into existence, and is no different in any way than any other one - you pay disk access for the records you use, but enjoy OS cache/page management, with cache shared between processes and between subsequent runs -- no need to special case this dictionary just because it's big.

The Python 3.6 implementation switched from a hash-indexed <hash,key,value> vector to a <hash -> order> and ordered <key,value> vector, and saw speed improvements and memory use reductions. Due to Python's legacy and structure, they could not have gone farther on the K way, except for CPU cache effects.

Having used APL and K for over 15 years now, I find it weird seeing the same Smalltalk-inspired hash table structure come up in every single new runtime and language implementation.


There is a lot to learn from K, but there are two problems:

1) It is not open source, but a commercial product. Although some source code can be found online, there is no license, so I would not dare using any of it in a free software project.

2) Whitney's code is worth studying. I have spent a lot of time "deciphering" the b interpreter in kparc and it has been a great exercise, but it is not for everybody. Most people will find it unnecessarily obfuscated. It certainly is not obvious for a casual observer.

I wish (1) changed, then we could see annotated versions of the code that made it more palatable to solve (2) too. But according to what I have seen, this is not going to happen.


Totally agree. In the context of learning lessons from K about implementing dict/hashmap, though, neither of these matter.


Totally agree too :)

In particular, your explanation about K maps would be a great addition to Cheney's post.


One wish of mine (because I'm working in a relational/array lang) is find a minimal implementation of K anotated and explained. Like "Let's build a array interpreter" like with http://www.craftinginterpreters.com/.


I’m on mobile so you’ll have to google yourself, but ...

There was such a series based on Lisp that included a mini APL interpreter (among others) by .. Tim something?

Specifically for K, John Earnest’s oK is a wonderful K implementation in JavaScript. It’s not “crafting interpreters” of course, but IIRC it is well written and well documented, definitely not the terse Whitney style.


I take a look at https://github.com/JohnEarnest/ok/blob/gh-pages/oK.js and is still dense. Is possible to trace things but the important stuff is understand why things are done that way.

I read also https://sblom.github.io/openj-core/ioj.htm that give more context.


oK is ultra verbose compared to Whitney ... not sure if you can find anything more verbose than that.

I was thinking of Tim Budd's interpreters and compilers; forgot they were reimplementation of Kamin's. See https://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/... - although it's early '90s, so likely too dated to be useful for you.


Wouldn't that have O(n) removal time?


With all these attributes, yes; Thanks, I've updated my original comment to reflect this.

If you give up insertion order, you can move the latest insertion to the deleted place, and you're back to O(1).

If you give up the O(1) keys and values list access, you can add a "deleted" bit to every entry to say that it is indeed deleted, but then you can't reuse keys() and values() as-is. Some operations, such as "apply-to-values" can use the vectors with the deleted values, but in general they cannot.

In my experience, deleting from a dict is about 1:10 less frequent, in the sense that the vast majority of dicts never experience a deletion. YMMV, obviously.

Over 3 decades of programming professionally, the K tradeoffs make way more sense to me than the Smalltalk ones.


I agree; there are a lot of tradeoffs you can make for the map type, and having O(1) amortized deletion usually comes at a cost of making all the other operations slower (either by a constant factor, or asymptotically).

I do wish more people were aware of this because (as you say) deletions are much more rare.

Often compilers will use a non-standard map for just this reason (symbol tables basically never delete).


> the vast majority of dicts never experience a deletion

Come to think of it, I cannot recall when I last used the delete() function in Go, even though I write a lot of Go code.


> Each different map are different types. For N map types in your source, you will have N copies of the map code in your binary.

Is this necessarily true? IIRC the compiler will roll identical functions into one if it detects that they have the same body (instructions wise).


Won't the read/written values to the data structure depend on different constant values (data size) in the different functions, making it unfoldable?


Ok, here's a weaker question: do instantiating an std::unordered_map<int, int * > and std::unordered_map<int, long * > cause two copies of the template to appear? Yes, I know that the functions must both appear so that they can be called–but do the function bodies get duplicated (as opposed to one "function" just falling through into the shared implementation).


There's an optimization that targets specifically this case: identical code folding. It's not safe in general, because the standard requires that the different versions must have different addresses, but if the compiler can prove that at least one of the functions doesn't have its address taken (or if the user passes in a flag saying "no, I don't care about that bit of the standard"), they can be merged.


Isn't it trivial to satisfy the standard by making one of the functions to use a trivial indirection? Or just prepend the function body with a nop and one of the functions point there.

I am curious which part of the standard specifically disallows aliasing function pointers though. I guess it's only a problem for static member functions that have the same signature for both instantiations, since only these can have the same type.

Edit: Actually is it the opposite? Aliasing is allowed for pointers of the same type, the problem would be the aliasing of function pointers of different type. Then again, I think the nop solution should work.


I haven’t checked but I wouldn’t be surprised if they are duplicated.

(Edit to add: if this done, it’ll be part of the link-time optimization step, which I think is relatively new in all the major C++ compilers.)

If you include debugging symbols, there will probably be multiple copies of the duplicated code just because it has different names.

You certainly should be able to merge duplicated functions after stripping symbols but I don’t know if that’s a standard thing. I’ve definitely been disappointed in the past by the performance of “dead code stripping” optimization. It’s not quite as trivial a problem as it sounds because when you have groups of functions that call each other, you need to unify two graphs rather than just look for individual identical blobs. The obvious implementation (multiple passes looking for identical leaf functions) is slow and will miss some cases.


Debug symbols don’t inhibit identical code folding. What you get is one copy of the function and one arbitrary file/line entry which makes it look like your program makes impossible function calls, if you happen to walk the stack. This can easily be seen in a C++ program with lots of Protocol Buffers because protobuf generated code has lots of identical small functions.


> which I think is relatively new in all the major C++ compilers.

LTO has been available in gcc since 2009 and even longer in clang AFAIK. MSVC had LTO for... I don't know but googling a bit shows that visual studio 2005 supports /GL.


Yeah, sorry, I should have clarified “relatively recent” as meaning the last 10 years or so. That’s recent for C++! :)

This stuff is being actively worked on so I don’t know if merging identical template instantiations is a thing yet. Any idea?


MSVC has /OPT:ICF which does this. GCC has -fipa-icf, and the GNU Gold linker has --icf={safe,all} options (since it's operating on ASM it needn't be at the compiler level).

In both cases it does not only work with templates, but with any kind of function. However if you use --icf=all, some subtle bugs can appear since function pointers that would compare different in a default build would now compare equal. I have never been bitten by those.


Cool! I’ll give those a try.


Generally you would have a different implementation for each struct you use as a key. And this is a template argument so this would certainly get in the way of folding.


Not necessarily, but it takes extra work to make that happen (and it’s the linker that does it, not the compiler. See for example https://ai.google/research/pubs/pub36912).


Which do you mean by "the compiler"? I'm pretty sure the Go compiler doesn't have a feature like that.


This was referencing templated maps, so your favorite optimizing C++ compiler.


Or most optimizing compilers written after the mid 80s?

(Looking to be corrected, not sure)


It's worth discussing: Just because "the" Go compiler doesn't implement it now doesn't mean a Go compiler won't in future


Everytime I see an article from Dave Cheney I am wonder why is it so popular?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: