Mold: A Modern Linker

PoignardAzur · on Feb 23, 2021

I wonder how rui314's assertion that incremental linking is a poor tradeoff squares with Zig's decision[1] to build its own linker with "in-place binary patching".

I assume part of the difference is that Zig has complete control over its environment, whereas Mold is trying to be a general-purpose linker, but still, I wonder if there's some insight to be gained from crosspollination there.

(like, maybe Mold could have an "--incremental" option with associate documentation explaining that this option is only beneficial if the input object files follow some specific guidelines)

[1] https://kristoff.it/blog/zig-new-relationship-llvm/

rui314 · on Feb 23, 2021

I got an impression that (so take it with a grain of salt), for incremental linking, Zig emits code that always uses PLT and GOT to access global functions and variables. Later, in order to replace functions or variables, Zig rewrites only PLT or GOT entries.

On the other hand, by default, gcc or clang emits code that does not use GOT or PLT, which makes the situation much more complicated.

In addition to that, maybe you don't have to support all ELF fancy features if you know that you are linking Zig programs? I'm not familiar with Zig, but I can say that some kind of minor feature, such as weak symbol, can make incremental linking a lot harder.

AndyKelley · on Feb 23, 2021

Your impression is correct.

When the Zig compiler is asked to produce an executable, using only .zig source code, it is in full control of the entire compilation process, frontend to backend, so it can make decisions that ease the requirements of its linker, in order to facilitate incremental compilation and linking. For example, when linking pure Zig projects, there are no relocations; the code is generated directly in place in the final executable, almost as if there is no linker step. However, when asked to link against third party objects or static libraries, Zig must compromise some of these advantages. Currently, in this situation, Zig will fall back to doing the incremental compilation of .zig code into an object file, and then invoke LLD (via the zig executable invoking itself as a child process) to link the objects together. As the Zig self-hosted linker gains functionality, this fallback will happen less often; instead the compromise will be in the code paths taken in the linker, based on what assumptions it can make about the linking requirements that are required for a given job. The long term plan is to eliminate the dependency on LLD altogether.

Side note - mold is a brilliant project! Thank you for making it and pushing the state of the art forward! Also I love the logo.

lhorie · on Feb 23, 2021

Zig's compiler isn't optimizing for linking speed over everything else though AFAIK. Also, Andrew says in the thread[0] that the linker doesn't (yet) address the types of complex scenarios that rui talks about.

[0] https://github.com/ziglang/zig/issues/1535#issuecomment-6784...

_dh54 · on Feb 23, 2021

> I won't avoid Unix-ism when writing code (e.g. I'll probably use fork(2)).

It’s probably fine to not avoid most Unix APIs but fork() is truly an exception here. Fork() is not friendly to any third party library you may use because their state may become invalidated after a fork but the library has no way to know if the process has been forked. This is especially bad if the library uses multi threading. The best way to avoid a random unintended bug later on is to only use fork() when you plan to execve() right after.

rui314 · on Feb 23, 2021

Author here. Good point. I ended up not using fork() without exec(), so that should be fine now, but here is my original plan to use fork():

I wanted to keep a linker process running as a daemon so that it doesn't read the same files over and over again. After loading input files, the linker becomes a daemon and calls fork() to create a worker process. Then the worker process does the rest of linking. In other word, a daemon is a "clean" copy of a linker process image, and each child is specialized for each actual linker invocation.

It turned out that the linker runs much slower with fork() because of the overhead of copy-on-write. You cannot keep a fresh copy of a process just by calling fork() for free. There's a tax associated with it. I tried to workaround, but in the end I had to give up with the fork()-based worker process design.

cout · on Feb 23, 2021

I wonder if the tax could be reduced with huge pages. Much of the cost of COW is the large number of page faults, but with huge pages you could reduce the number of faults to 1.

dataflow · on Feb 23, 2021

> in the end I had to give up with the fork()-based worker process design.

Honestly that's a good thing, or your program just fundamentally couldn't possibly ever work on native Windows!

pjmlp · on Feb 23, 2021

Or mainframe OSes.

tobylane · on Feb 23, 2021

Not my area. Does compiling happen on mainframes these days?

pjmlp · on Feb 23, 2021

For sure, IBM i, z/OS, and Unisys ClearPath have kept their own programming languages, in addition to Java, C and C++, both on their native environments as well as POSIX application layer.

Modern mainframe development environments are browser based.

eek04_ · on Feb 23, 2021

Not possibly working on Windows is, for some of us, a feature. I have no wish to support that ecosystem due to the past ethical violations that founded it, and making it impossible to do so would avoid having to deal with it.

bla3 · on Feb 23, 2021

What about posix_spawn() with POSIX_SPAWN_USEVFORK? That saves some of the overhead. See eg https://github.com/rtomayko/posix-spawn#benchmarks

ComputerGuru · on Feb 23, 2021

posix_spawn is just a wrapper that takes care of setting common parameters for newly forked instances (eg pgrp) and prevents you from doing things that might be overly unsafe or could break vfork from being used in its optimized form. It’s implanted at the libc level, so it’s not a magic syscall that moves the burden of process spawning to the kernel.

_dh54 · on Feb 23, 2021

This doesn’t change your core point but just to be thorough, it’s actually a system call on a few systems, notably macOS.

bla3 · on Feb 23, 2021

Yes, but with that define it wraps vfork instead of fork.

chubot · on Feb 23, 2021

I can't think of any reason a linker needs to use many libraries, especially not one that start threads. A linker reads files and writes a file. It's a single-purpose tool, not a monolithic app with many dependencies.

Either way, threads should be controlled by the application and not libraries. Well-written libraries like sqlite and Lua are parameterized by I/O and concurrency. They don't read files and start threads behind your back.

_dh54 · on Feb 23, 2021

I agree with you on both points but when building large complex applications (or tools!) this stuff tends to unintentionally creep up over time and it’s not fun to debug non-deterministic bugs due to an opaquely broken implicit contract between you and a poorly written library that got pulled in transitively. Sometimes those libraries are unavoidable closed source system libraries (https://bugs.python.org/issue33725) and the issue ends up being not easily fixable. It’s just risk minimization.

chubot · on Feb 23, 2021

It's a moot point since the author already said he's not using fork(), but C and C++ apps don't transitively pull in dependencies, especially in the open source world.

That kind of thinking comes from node.js and Rust as far as I can tell.

There is never ever a situation where a program like GCC or Clang will acquire a third party dependency without a commit that explicitly changes the build system.

The Python example isn't really relevant because Python modules are shared libraries, but a linker doesn't have or need shared library plugins (i.e. dynamically linking against arbitrary machine code at runtime).

ssokolow · on Feb 23, 2021

Of course C and C++ apps transitively pull in dependencies.

First, pkg-config exists and most projects on Unixy OSes use it these days. It doesn't really make much sense to argue "it all has to be specified on the command-line" when the command-line the developer cares about looks like this:

gcc `pkg-config --cflags gtk+-3.0` -o test test.c `pkg-config --libs gtk+-3.0`

(That adds things like `-lcairo` which aren't at all visible at first glance)

Second, why do you think things like libwayland-client, libfreetype, and libexpat show up in the ldd output? They're certainly not in the pkg-config output.

Likewise, nothing says a library has to bump its major version number when it adds a new backend dependency, which means that a distro may add a transitive dependency through a shared library update that doesn't correspond to you bumping your version.

(You can gain new backend dependencies without even recompiling anything you're responsible for.)

Heck, this post explicitly touches on that disconnect between what you add to your build plan and what ldd lists for C and C++ codebases.

https://wiki.alopex.li/LetsBeRealAboutDependencies

chubot · on Feb 23, 2021

That is a good point... I overstated the case, although I would say the GCC and Clang examples in particular are true. I guess the stronger argument is that compilers are more of a special case low in the dependency stack, as opposed to apps (and especially GUI apps). Also, I really hope none of those dependencies start long-lived threads without the app's knowledge, though there's nothing stopping them from doing so.

_dh54 · on Feb 23, 2021

In my example, using Apple’s system carbon libraries at the C level transitively pulls in their multithreading framework. It does happen. Everything that can break will break, that attitude is necessary to build robust software.

082349872349872 · on Feb 23, 2021

Forget libraries, a static linker (back in the day) only needed something like half-a-dozen different syscalls, not even a full userland or anything as bougie as libraries.

(if it's unclear, this is strong agreement with chubot. If you make the problem harder than it has to be, you have only yourself to blame...)

FeepingCreature · on Feb 23, 2021

Damn, the idea of string interning symbol names in a preload pass is tight. That's one of those sentences that you read and from then on it's just obviously the right way to write a linker.

cout · on Feb 23, 2021

I was surprised to learn that this isn't how linkers work. Apparently one of the reasons c++ template code is so slow to link is long symbol names.

jdblair · on Feb 23, 2021

From a marketing perspective, "mold" meaning "a form used to cast an object from liquid" is a lot more appealing than "green fungus growing on bread." A mold for casting objects is also a lot closer, metaphorically speaking, to what a linker does.

I honestly thought that was the meaning the author was trying to evoke before I saw the picture on the github page.

rui314 · on Feb 23, 2021

Author here. Haha, that's perhaps true. But at the same time, it seems like a tradition to give a silly name (e.g. "git") to a tool, and I actually like that name and the image to show that I'm not too serious. This is a fun project but not ready for production use.

XorNot · on Feb 23, 2021

Given the C was named because it was derived from B, I think there's an effective tradition in compsci of naming things by just playing with the letters.

M comes after G so the name tracks perfectly while also being weird and distinctive.

cb321 · on Feb 23, 2021

Sounds like you just need a multi-bump cake/jello mold like [1] with multiple "input spigots" pouring in with a "fast harden" aspect to have the perfect logo/name combo. Not sure how to convey rapid hardening with simple art, though... :-)

EDIT: You may just have to settle for speed/parallelism being conveyed by 2..3 spigots pouring in. :-) It's perfect - you can stay with moldy bread while it is a major work in progress and evolve to the more finished logo when your own work is "hardened" -- all without changing the name. ;-)

[1] https://www.foodandwine.com/cooking-techniques/baking/best-b...

hobofan · on Feb 23, 2021

> that name and the image to show that I'm not too serious. This is a fun project but not ready for production use.

I'm not sure if that image is the best way to communicate that status, given that the sudo sandwich logo exists (which coincidentally bears some resemblance to your moldy bread). A big bold "not ready for production" at the top of the README is probably a better way to achieve that.

GrinningFool · on Feb 23, 2021

The author has no such obligation, and "ready for production" is something that you would want to verify for yourself based on your evaluation of the project and your requirements.

andrewshadura · on Feb 23, 2021

Or it could be "bold".

koolba · on Feb 23, 2021

Or “weld”.

Which I think might just be the greatest name for a new linker.

marco_craveiro · on Feb 23, 2021

Winner.

rui314 · on Feb 23, 2021

For the record, a candidate for another name was "weld" as it joins pieces of data into a single binary. That's I think a good name, but I couldn't come up with a backronym.

pwdisswordfish6 · on Feb 23, 2021

That's fine, because it's not exactly clear what "ld" stands for to begin with. If anyone asks, you could just say that the W and E stand for "Weally Efficient".

quesera · on Feb 23, 2021

Boston says: "Wicked Efficient"

dralley · on Feb 23, 2021

Whickhed

siithy · on Feb 24, 2021

How about "wildly experimental linking device"?

Edit: Had fun thinking up a couple others:

"whimsically eclectic logical design"

"willfully egregious lackadaisical decision"

"wise element location director"

"world exploring layout detector"

"wrong-headed exasperating liability defender"

"workaday execution layer developer"

"wonderfully elegant logistical delegator"

formerly_proven · on Feb 23, 2021

wildly exhilarated link editor

felixr · on Feb 23, 2021

I was going to say: but the casting template is spelled 'mould' not 'mold'.

Looked it up and realised that the US spelling is actually also 'mold' https://www.oxfordlearnersdictionaries.com/definition/englis...

kleiba · on Feb 23, 2021

Moreover, the fungus is spelled 'mould' in British English: https://dictionary.cambridge.org/dictionary/english/mould

brigandish · on Feb 24, 2021

Why wouldn't the fungus be spelt mould in British English? I'm just wondering if there's a reason beyond the natural surprise of finding variations, not trying to be argumentative.

I've just been interrupred while writing this to be told that America and Britain number calendar weeks differently (Britain follows ISO[1]) and that Apple's calendar is fixed to the US version. It never ends…!

[1] https://en.wikipedia.org/wiki/ISO_week_date

ZephyrBlu · on Feb 23, 2021

You gotta love the English language. Contradictions abound.

__alexs · on Feb 23, 2021

I wonder if this another Buffalo situation. Mold is also a place name...

Twisol · on Feb 23, 2021

On the other hand, Rust was apparently named after the fungus, not iron oxide. There seems to be a new category of low-level tools named after life forms that grow in ecological niches :D

cout · on Feb 23, 2021

How interesting. Doesn't the fungus get its name from the oxidized metal though and not the other way around? Mold the form and mold the fungus seem etymologically unrelated, however.

Twisol · on Feb 23, 2021

> Doesn't the fungus get its name from the oxidized metal though and not the other way around?

Huh, apparently so! So Rust-the-language isn't necessarily named after oxidized metal, but it _is_ named after something that's named after oxidized metal.

agumonkey · on Feb 23, 2021

And what's the relationship between the programming language and fungi ? is it a root growing invading the world concept ? is it a pun on 'fun'-'gi' ?

steveklabnik · on Feb 23, 2021

https://www.reddit.com/r/rust/comments/27jvdt/internet_archa... is the primary source here. Quotes:

> <graydon> fungi are amazingly robust

> <graydon> to start, they are distributed organisms. not single cellular, but also no single point of failure.

However, I will note that graydon uses words like "I think I" and "I remember kinda" and everyone says "yes this means this is 100% the source of the name" whereas I take it to be like, "this is one of many reasons that Rust is named Rust."

You also have to remember that Rust was a very different language in many senses back when Graydon was choosing the name, so allusions may not make sense now but may have then. Early Rust was much more erlang-like, which may make the above feel more relevant.

Supersaiyan_IV · on Feb 25, 2021

That's funny because BFD linker's original meaning was Big F**ing Deal. Linkers sure are marketable oddballs.

nothasan · on Feb 23, 2021

Would emulsifier roll off the tongue better?

cout · on Feb 23, 2021

It would likely get shortened to emu, and then we would all be wondering if it's an emulator or a large bird.

mst · on Feb 23, 2021

Also, everybody in australia would be leery of using it lest it win a war against them again.

XorNot · on Feb 23, 2021

I just want to say that this is one of the most interesting README.md's I've ever read in a project.

Those performance improvements are crazy as well: here's hoping this becomes the standard.

jeffrallen · on Feb 23, 2021

Yes!!!! This is excellent tool engineering, choosing the right target and setting a high but attainable target which fundamentally drives the design. I love the idea that `mold` "competes" with `cat`. That right there is genius framing of the problem.

joosters · on Feb 23, 2021

If we mmap a lot of files, _exit(2) is not instantaneous but takes a few hundred milliseconds because the kernel has to clean up a lot of resources. As a workaround, we should organize the linker command as two processes; the first process forks the second process, and the second process does the actual work. As soon as the second process writes a result file to a filesystem, it notifies the first process, and the first process exits. The second process can take time to exit, because it is not an interactive process.

Is this safe? Are you sure that the _exit() delays are not part of the kernel committing all the pending mmap I/O to the buffer caches? If a build script links a binary using this, and then immediately executes it, the second (background process) might still not be finished. Is it guaranteed that all of the mmap I/O will be visible - or will the binary appear incomplete?

I don't know the answer to this myself - it would be clear cut if the linker was using write() calls, because UNIX guarantees that future read() calls will see the results, but the ordering guarantees with mmap I/O are far more loose, I believe.

rui314 · on Feb 23, 2021

It is safe because the child process calls munmap before telling its parent process to exit. munmap is guaranteed to act as a commit operation. Alternatively, you can call msync (https://man7.org/linux/man-pages/man2/msync.2.html) if you want to keep it mmapped.

amluto · on Feb 23, 2021

Linux gives much stronger guarantees than POSIX here. I wonder if you save measurable time by skipping munmap.

ncmncm · on Feb 24, 2021

munmap is often a remarkably slow operation, if your process is multi-threaded, because of TLB shootdowns; on each munmap, all the other threads get paused and their page map caches get trashed, each time.

It is usually much better to have multiple regular processes, instead of threads, that only share chosen mappings, if you want to use munmap. Or, you can terminate and join all your threads before you start munmapping.

rui314 · on Feb 23, 2021

Is that documented?

amluto · on Feb 23, 2021

I would have said yes, but I can’t find it. That being said, Linux has a “unified page cache”, and MAP_SHARED is coherent with read(2) and write(2), at least on any local filesystem (not sure about FUSE) and when direct IO is not involved.

That being said, I could easily believe that largeish pwrite(2) calls would be comparably fast compared to mmap, since mmap needs to play with page tables, and page faults on x86 are expensive. MAP_POPULATE would also be worth trying if you’re not already using it.

I assume that copy_file_range(2) is out of the question due to relocations.

rui314 · on Feb 24, 2021

I once counted the number of 4 KiB blocks that has at least one relocation. I used Chrome as a sample. It turned out that almost all 4 KiB blocks have at least one relocation. They mutate everywhere.

brokencode · on Feb 23, 2021

Is it possible to use alternate linkers with Rust? Anything that helps Rust compile faster is pretty cool, especially if it allows better multithreaded scaling.

rui314 · on Feb 23, 2021

I believe the Rust compiler can use an alternative linker, but I'm pretty sure that mold can't link Rust programs because it lacks lots of features. It's so experimental that I didn't even think about trying to link a Rust program with it.

lankalanka · on Feb 23, 2021

Sure you can. Please refer to the doc of zld on how to use zld as an alternative linker for Rust as an example:

https://github.com/michaeleisel/zld#if-using-rust

k_bx · on Feb 23, 2021

Yes, put this in your ~/.cargo/config.toml:

[target.x86_64-unknown-linux]

linker = "ld.lld"

zamalek · on Feb 23, 2021

And it does help compilation times quite a bit.

saagarjha · on Feb 23, 2021

It’s interesting to see where the bottlenecks are. Some of these are clearly algorithmic and many are solved with simplifying the work or using concurrency, but a lot are platform-specific and nonobvious.

gravypod · on Feb 23, 2021

This page is extremely correct about how extremely under-documented linker scripts are. I wish I could learn more about them since you can do a lot of C/C++ magic.

dbrgn · on Feb 23, 2021

This might be of interest to you: https://twitter.com/theavalkyrie/status/1349458442734469123

volta83 · on Feb 24, 2021

> the most important thing is to fix the layout of an output file as quickly as possible, so that we can start copying actual data from input object files to an output file as soon as possible.

Why do we need to copy data at all ?

Can't we just have an object file with pointers to other files ?

Like sure, if I ever want to ship my binary somewhere else, I'd like to do this. But for local interactive development, all the files are already in my local machine, so I don't know why we would need to create a second copy of them within some other file.

Tyr42 · on March 2, 2021

Hm, well what happens when you execv a executable then? Everything needs to get copied into ram and then jumped to, so you would be putting a linker implementation into a syscall right? And if it's just as easy as concattening the files, them just do that in userspace.

teddyh · on Feb 23, 2021

I am not sure that multi-threaded is the way to optimize file writes; I would have thought that leaving as much as possible up to the kernel, by using writev(2), splice(2), etc. would be best.

rui314 · on Feb 23, 2021

I chose mmap based on benchmarking. Writing a 2 GiB memory buffer to a file using write(2) was slower than directly constructing file contents to mmap'ed memory region.

What is more interesting is the real bottleneck was not about mmap vs write(v) but in the filesystem. If you create a fresh file and write 2 GiB of data to that file, it takes like 700 milliseconds on my machine (ext4 fs), but if you write the same amount of data to an existing 2 GiB file, the IO speed doubles. So, the filesystem's performance to allocate new disk blocks seems to limit the performance of my linker. That reminded me of the axiom: don't guess about performance but measure.

teddyh · on Feb 23, 2021

> So, the filesystem's performance to allocate new disk blocks seems to limit the performance of my linker.

If you know the total size ahead of time, you might try using fallocate(2).

rui314 · on Feb 23, 2021

I actually tried fallocate(2) but it didn't change the performance characteristics at all. It doesn't seem to do what its man page says.

mst · on Feb 23, 2021

I wonder if that's filesystem/OS dependent.

pjmlp · on Feb 23, 2021

That is only a solution if you want to hardcore it to a specific OS.

fanf2 · on Feb 23, 2021

Assuming that the linker script(s) used by a program don’t change very often, it might be interesting to compile them, in effect building a custom linker specialized to the script(s). Most programs don’t specify a linker script so maybe the linker can be specialized just to the usual scripts, and odd cases can use post-link binary editing.

ncmncm · on Feb 24, 2021

Standard C++ has parallel algorithm primitives so there no need to depend on TBB anymore.

You can give each thread its own malloc that just mmaps what it needs, if you are leaking everything anyway.

This is a case where using a raw new() and raw pointers is much better than using a smart pointer, because touching things to run destructors unnecessarily is itself expensive.

fwsgonzo · on Feb 23, 2021

I would use this in a heartbeat if you make execute-only a first-class feature. That means segments with E only, no reading.

My experience with every linker so far is that to have XO I will need to specify the memory layout in a linker script manually. It's not as nice as simple linker argument, if that is possible.

rui314 · on Feb 23, 2021

By execute-only segment, you mean a segment which is not readable but executable, right?

If so, that's a relatively new CPU security feature. I think some ARM processors support it, but AFAIK x86 doesn't support it at the moment. On x86, if you make a page executable, it automatically makes the page readable. R and X bits are not separated in the page table. I bet Intel and AMD will ad NR bit (no read bit - analogous to NX bit) soon, though.

fwsgonzo · on Feb 23, 2021

Intel slide deck here: https://linuxplumbersconf.org/event/4/contributions/283/atta...

If they really support it, I didn't pay too close attention, but at least it's on their mind.

rui314 · on Feb 24, 2021

Ooh, I didn't know that x86 had such feature. But is this virtualization only?

hctaw · on Feb 23, 2021

Overwriting executables during a build rather than copying them at the end can be problematic if a build artifact is used to bootstrap itself (like a compiler). If the build fails you can't try again.

enriquto · on Feb 23, 2021

So happy to see progress in linkers!

Can this one statically link shared libraries? That would be a great feature.

EDIT: I realize that such a feature would necessarily involve a fair amount of black magic. But it does not seem an impossible endeavor.

rui314 · on Feb 23, 2021

Are you asking if mold can link an executable file and its depending .so files into a single binary? If so, neither mold nor other major linkers can't do that.

enriquto · on Feb 23, 2021

Exactly. That would be a pretty big deal and definitely a killer feature. Equivalently, a tool for converting .a to .so

spacechild1 · on Feb 23, 2021

> Equivalently, a tool for converting .a to .so

Well, an .a file is just an archive of object files. Turning several object files into a single binary is pretty much what a linker does. However, you can't really automate this because object files might have external dependencies and the linker needs to know what these dependencies are.

enriquto · on Feb 23, 2021

> you can't really automate this because object files might have external dependencies and the linker needs to know what these dependencies are

If it can be done at runtime of the program, I guess it can be done at runtime of the linker, doesn't it?

spacechild1 · on Feb 23, 2021

Technically, Linux and macOS allow to build a shared library with undefined symbols and let the loader figure it out, but I wouldn't recommend it (it's easy to miss linker errors). On Windows, however, this is not possible.

EDIT: also, building with unresolved symbols requires the host application to know about and link all the required external libraries, which is usually not what you want...

higerordermap · on Feb 23, 2021

Yeah I have the same question. lld can list the dependencies right?

spacechild1 · on Feb 23, 2021

Yes, because someone put them there ;-)

malkia · on Feb 23, 2021

Since perf is at utmost importance for this project, and intern has been found to be used a lot, maybe a pinch of small optimization is to move the static ConcurrentMap out of the function, hence avoid atomic<bool> check on whether it's initialized -

  static Symbol *intern(std::string_view name) {
    static ConcurrentMap<Symbol> map;
    return map.insert(name, {name});
  }

to

  static ConcurrentMap<Symbol> map;

  static Symbol *intern(std::string_view name) {
    return map.insert(name, {name});
  }

probably though it won't bring much, but for the sake of squeezing every bit out there (and it was the easiest I could find - lol)

malkia · on Feb 23, 2021

Wait this code can't work - as you holding only a string_view in the Symbol...

  // __start_ and __stop_ symbols
  for (OutputChunk *chunk : chunks) {
    if (is_c_identifier(chunk->name)) {
      start(Symbol::intern("__start_" + std::string(chunk->name)), chunk);
      stop(Symbol::intern("__stop_" + std::string(chunk->name)), chunk);
    }
  }

rui314 · on Feb 24, 2021

Ah, that's a bug. Thank you for finding it!

ArchOversight · on Feb 23, 2021

This does seem broken, since the temporary string that is created by the concatenation is short lived, and the std::string_view inside the Symbol won't hold on to it, just the pointer to the data.

malkia · on Feb 23, 2021

Or this:

for (std::string_view arg : config.version_script) parse_version_script(std::string(arg));

malkia · on Feb 23, 2021

Although if somehow nothing gets really freed, and std::string() is kept intact after calling the destructor (e.g. it's data() is still valid) then it'll work :)

ArchOversight · on Feb 23, 2021

`parse_version_script` is defined as taking an `std::string`, std::string() over an `std::string_view` will create a new std::string with a copy of the data from arg.

KingOfCoders · on Feb 23, 2021

Can this already be used with Rust? I currently use LDD which already brought some speed up.

rjsw · on Feb 23, 2021

Did you mean lld ?

KingOfCoders · on Feb 24, 2021

Yes, thx!

Ecco · on Feb 23, 2021

I’m surprised by the statement that ‘cat’ is slow because it’s not multithreaded.

Isn’t ‘cat’ io-bound?

IshKebab · on Feb 23, 2021

I think his point is that you can match `cat`s speed because it is IO bound and you can make a linker IO bound by using lots of threads.

baruch · on Feb 23, 2021

One way to speed up IO-bound code is to have multiple threads-of-execution (threads or async-io) in order to have multiple requests in the air. Specifically SSDs benefit a lot from multiple requests in parallel as that utilizes their internal parallelism. HDDs would benefit only a little bit but RAIDed HDDs can also benefit from parallelism.

082349872349872 · on Feb 23, 2021

It used to be lore in the industry that any decent linker is IO bound, including the single-threaded ones.

kasabali · on Feb 23, 2021

thethirdone · on Feb 23, 2021

Not caring amount memory leaks is not the same as not caring about memory consumption. The next line shows exactly why the choice is made.

> It is because most objects that are allocated during an execution of mold are needed until the very end of the program.

Using free would not affect that memory consumption much because most consumption would be freed at the end of the program.

zamalek · on Feb 23, 2021

Raymond Chen has an amusing story about this[1]:

> Since the missile will explode when it hits its target or at the end of its flight, the ultimate in garbage collection is performed without programmer intervention.

[1]: https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...

tjalfi · on Feb 27, 2021

I also submitted it here in 2017[0].

[0] https://news.ycombinator.com/item?id=14233542

kasabali · on Feb 23, 2021

user-the-name · on Feb 23, 2021

But, again, that line does not, in any way, say that this project does nor care about memory consumption.

hyperman1 · on Feb 23, 2021

There might be a 'best of both worlds' idea in there: If mold mmaps the input files instead of reading them, the linker would not trash all cached files, and be faster itself as it reuses already cached files.

Now the author seems a smart fellow, so maybe he did just that already, I didn't check the source.

rui314 · on Feb 23, 2021

mold uses only mmap for file IO. Not only input files but also an output file are mapped to memory using mmap(2).

baruch · on Feb 23, 2021

Wouldn't this mean that you need to memcpy around? Maybe async-io (io_uring?) help here by doing zero-copy writes directly from the source. I don't know how much you need to mangle code (GOT/PLT, offsets) and how much it is a straight copy of object to object.

_a1_ · on Feb 23, 2021

> As an implementation strategy, we do not care about memory leak because we really can't save that much memory by doing precise memory management. It is because most objects that are allocated during an execution of mold are needed until the very end of the program. I'm sure this is an odd memory management scheme (or the lack thereof), but this is what LLVM lld does too.

The fact that someone does it wrong doesn't mean that we should do it wrong as well ;(

I don't think I like this approach. It may work now, but will probably seriously limit the possibilities in the future.

klodolph · on Feb 23, 2021

If you think this approach is wrong, could you articulate the reasons why you think it is wrong? This is a classic memory management strategy... if a program is running as a batch program, all memory will be freed when the program exits. Any alternative memory management strategy would have to free and then reuse memory in order to show improvement. If it's a small amount of memory freed, or if the memory is unlikely to be reused, the benefits of freeing memory are smaller and it may actually slow the program down.

The fact that this program can successfully link Chrome means that we have fairly solid baseline performance metrics we can use for "big" programs. Chrome is just about the largest program you might ever need to link.

spijdar · on Feb 23, 2021

Yeah, this is also the strategy GCC and co use generally AFAIK. In a program like GCC where a single invocation will operate over a single file/unit, there's just not much benefit to trying to re-use data; if GCC or LLVM were closer to "build servers" with persistent state that compiled and linked objects on demand then it'd make sense, but in their current model, it's easier and safer to just keep data around.

rui314 · on Feb 23, 2021

Another classic example is Apache's memory pool. In Apache, you allocate memory from memory pools associated with the current request or connection. Memory pools are freed as a whole when a request or a connection is complete. mold's memory management scheme is not very different from that if you think the entire linker as a single "session" which uses a single memory pool.

_a1_ · on Feb 23, 2021

> If you think this approach is wrong, could you articulate the reasons why you think it is wrong?

Because later if you want to reuse parts of the code in a continuous environment (e.g. a daemon), then you will be surprised that you have memory leaks all over the place (or worse, someone else will discover it by accident).

I don't have a problem with the end-of-process-releases-all-memory optimization. But I had the impression that the author uses let's-worry-about-leaks-later-because-OS-takes-care-of-it-for-free-(in-my-use-case).

Best approach to take would be to create a memory pool with fast allocation (e.g. TLAB allocation in Java, or how computer games do it), in order to have control over how the memory is freed or when.