Runtime code generation and execution in Go

fl0ki · 2024-05-29T13:53:39.000000Z

I didn't know Go reserved some registers for its own use. For libraries that don't use cgo but do use assembly [1], does this mean they have to generate that assembly with a compiler flag that avoids reserved registers?

[1] A common optimization technique, like in the superb https://github.com/klauspost/compress

yencabulator · 2024-05-29T23:12:25.000000Z

Generally the inline assembly is hand-written and must follow the platform rules, see "Architecture-specific details" in https://go.dev/doc/asm

The few tools that generate that asm, such as https://github.com/mmcloughlin/avo , would also have follow the rules.

You're not meant to take C source, crank it through GCC, and slurp that in as inline asm in Go.

Background: https://www.youtube.com/watch?v=KINIAgRpkDA

jhoechtl · 2024-05-29T16:08:39.000000Z

Pusha popa and never yield back in between should suffice?

fl0ki · 2024-05-29T16:58:35.000000Z

Non-CGO goroutines are preemptible via signals so there may still be more to it than that. If anyone knows the precise mechanism used I'd be very keen to hear it.

evacchi · 2024-05-29T13:53:45.000000Z

@ncruces and I are also contributors to wazero, so ask away if you have questions :)

kgeist · 2024-05-29T17:05:20.000000Z

I wonder how safe the their (wazero's) approach is from the security point of view. According to the article (and I know it myself too after studying the Go runtime), the Go runtime is quite finicky when it comes to execution of foreign code because it can easily interfer with the goroutine scheduler, GC, etc. (unless you use CGO which deals with it, but they don't use it). As the article explains, it's easy to introduce random crashes by, say, innocently using a "wrong" register. Also, the Go runtime can change considerably from version to version (say, when they introduced preempted scheduling based on signals, or when they moved away from segmented stacks etc.), and so today Wazero may work OK but with a new Go version it may unexpectedly crash or corrupt memory in some subtle ways. Considering that the whole point of WASM is sandboxing, reading this article didn't make me feel very confident about the project.

ncruces · 2024-05-29T20:00:33.000000Z

If it gives you some confidence, wazero (the Go Wasm runtime) is used by the Go team (in CI) to test their own Go-to-Wasm compiler.

So if they break wazero, they'll know pretty soon from failing tests of their own Wasm/WASI toolchain.

https://go.dev/blog/wasi

https://github.com/golang/go/blob/go1.22.3/misc/wasm/go_wasi...

As for sandboxing, wazero tends to take sandboxing rather more seriously than other similar runtimes.

Memory sandboxing is implemented through explicit bounds checks (rather than memory mapping and guard pages), which allows denser deployments and requires less runtime support.

And, by default, the WASI implementation doesn't expose any capabilities (not even access to the system clock). You can also interject any relevant WASI host calls, and do your own filtering.

And if the compiler is not your cup of tea, you can also use the (much slower) interpreter.

kgeist · 2024-05-29T20:42:30.000000Z

That's good to hear that the Go team themselves test wazero. Thank you for clarifying.

metadat · 2024-05-29T15:07:59.000000Z

Really cool article. This sort of stuff is mind expanding for me!

I wonder what it would take to make a version that works on Windows. Will you indulge me?

ncruces · 2024-05-29T15:11:25.000000Z

wazero works fine on Windows.

In fact the exact same assembly works across Windows, macOS, Linux and FreeBSD: the calling convention used by the compiler is entirely custom, and all "host calls" are provided by wazero, so assembly ports fine across OSes.

metadat · 2024-05-29T15:16:28.000000Z

Perhaps I was unclear (or I'm confused now, haha):

> The following is the tiny demo of the runtime code generation and execution in Go. I assume you are on a Unix-like system like Linux or macOS on an AArch64 machine.

Is there a way to get tinydemo working on Windows?

ncruces · 2024-05-29T15:22:54.000000Z

Oh, OK.

You need to replace calls to syscall.Mmap with windows.VirtualAlloc [1], syscall.Mprotect with windows.VirtualProtect, etc.

You don't even need to change the ASM preamble.

[1]: https://pkg.go.dev/golang.org/x/sys/windows#VirtualAlloc

evacchi · 2024-05-29T15:49:09.000000Z

and for reference see how it's done in wazero https://github.com/tetratelabs/wazero/blob/c397a402ad17e495a...

WhyNotHugo · 2024-06-01T11:08:30.000000Z

I found this article very approachable and well written. Thanks for sharing.

syngrog66 · 2024-05-29T17:58:59.000000Z

[flagged]

zbentley · 2024-05-29T19:03:32.000000Z

This is a compiler. It's a compiler that executes at a weird time (during program startup), but it's a compiler nevertheless. It's exactly as secure, or not, as its input code and the runtime in which that code runs once compiled.

This isn't "code-gen" as in the xz debacle, where source code transformation was used to obfuscate malicious input. This is "code-gen" in a similar sense to what the JVM does at runtime, or what the Linux kernel does with BPF programs, or similar.

While compiler toolchains aren't exempt from security considerations (especially this one, in which failing to appropriately reverse-engineer the Go runtime can result in compiled code behaving as scarily as an evil library dlopen'd at runtime in C), they're a totally different class of tool/vulnerability from what was used to exploit xz.

syngrog66 · 2024-05-29T19:15:21.000000Z

JIT is a form of code gen. My claim was about code gen in general, of which JIT is a special case. And I know how compilers work. No "well akshewally..." lecture needed or requested, thanks.

And if you re-read carefully what I literally said I did not say Jia Tan used the exact same technique as the OA. You added that supposition and then, conveniently, attacked it. Straw man.

FOSS (ie. rando stranger originating) software-based JIT is also a security "smell" at best. Tradeoff bewtween potentially improving perf in some cases vs increasing the amount of exploitable complexity surface for attackers. And always making it harder to reason about what code does at runtime. Less JIT and less code gen, in general, is wise in a world with growing threats from expert attackers backed by nation state level resources. Simpler is safer.

ncruces · 2024-05-29T19:46:14.000000Z

Random stranger backed by company develops Wasm runtime in the open, that's used by Go authors themselves to test their own Go-to-Wasm compiler.

I'm sure other Wasm runtimes are much better.

neonsunset · 2024-05-29T14:46:54.000000Z

[flagged]

ncruces · 2024-05-29T15:03:28.000000Z

Comments extolling the virtues of LISP or the CLR in a Go thread are "not negative about Go"; they're simply off-topic.

An on-topic, comment critical of Go would be: it's silly that cgo is considered so bad (why?) that people go to such great lengths to avoid it, when so many (otherwise great) Wasm runtimes already exist.

But if I'm already using Go and need a Wasm runtime, my options are wazero or wasmtime-go (etc), not LISP and C#.

yencabulator · 2024-05-29T23:19:49.000000Z

I would argue that attempting to hijack a random Go thread to complain about CGo is also off-topic. They can write a coherent article and post that, if they want, but repetitively distracting from the actual thing being discussed is tiring.

Also, CGo design is a trade-off, and makes perfect sense. Nobody has come up with a design to have both fast C ABI FFI (no stack/thread switch) and cheap green threads (small stacks). People complaining about CGo rarely seem to understand this.

raggi · 2024-05-29T15:07:24.000000Z

cgo is extremely slow, particularly in programs with many active goroutines.

I regularly look sad at profiles of production cgo programs where findRunnable is in the top 10.

ncruces · 2024-05-29T15:14:52.000000Z

Right, and both modernc and wazero are two highly sophisticated projects being used to avoid it.

But cgo being slow (and above all displeasent to work with) is what I'd consider valid criticism of Go.

Go has other advantages, FFI is not one of them.

ctvo · 2024-05-29T17:46:53.000000Z

Can you toss a reference to modernc? I can't seem to find the project.

ncruces · 2024-05-29T19:41:44.000000Z

Most projects live in the gitlab below; modernc is the vanity import which alone doesn't resolve to anything. The most famous project is an SQLite driver.

https://gitlab.com/cznic

https://modern.org/sqlite

metadat · 2024-05-29T15:09:46.000000Z

I thought Go could handle a million-plus active goroutines? Are you saying it'll work but be poor performance compared to something equivalent in C++, JavaScript, or Rust?

A quick search reveals cgo is mainly slow due to overhead when invoking non-pure go functions (i.e. C-functions).

https://stackoverflow.com/questions/28272285/why-cgos-perfor...

How does this relate and interact with goroutines in a performance impacting way? As in, why is the case with goroutines any different?

Edit: @neonsunset: Thanks <3. The thread you linked covers erl/BEAM and C#/dotnet, and makes salient points of skepticism about the practical need for millions of routines. I'm still curious what makes the Go story worse or different?

jerf · 2024-05-29T15:45:00.000000Z

Go can "handle" a million plus active goroutines, but that doesn't mean Go can handle a million goroutines doing literally anything a goroutine can do. They still have to fit into the resources of the computer. They can not all, for instance, have a gigabyte of their own active memory that they are doing something with, even though Go and a goroutine can individually "handle" a gigabyte of active working set.

Cgo calls take a lot of resources, or, at least, take a lot of resources if you're trying to do a million of them simultaneously. (In absolute terms, Cgo slowdown is one of those things that programmers hear bandied about in conversations and can easily come away with the impression that a single Cgo call will consume 150 milliseconds or something and instantly bring your program to a crawl, but a single call is not noticeable. It's only really a problem when scaled somehow. See also the belief that GC guarantees that my program is going to be frozen for 250 milliseconds every couple of seconds.) In this context probably the most relevant one is they need a full OS thread to work in the C runtime, so a million simultaneous Cgo calls would require a million simultaneous OS threads, and that's still a pretty big expense today.

For obvious reasons, Go can only "handle" millions of goroutines if all but some approximation of the number of CPUs available are doing nothing at any given time. The only common type of program that might have this circumstance is really network servers.

raggi · 2024-05-29T16:04:01.000000Z

Sure you can have a million goroutines, but you’ll now have more scheduler and gc overhead, and this is part of the issue.

Calling out to ffi requires preparing both the stack and the runtime for that call. It’s theoretically possible to avoid the need for this preparation, but with some caveats - you have to block the gc view of parts of the world (the parts ffi could be using) and you need to ensure that your runtime isn’t interacting badly with the ffi environment (e.g. not sending it signals and so on). Go’s arrangement (as is common with most runtimes) needs to active work to prevent both problems, and the cost of cgo is largely the cost of doing those things. Most managed runtime/gc languages have these challenges and it’s one of the axes that people say makes them not systems languages (a poorly defined term for sure, but fair on an axis of “can’t make calls to other software modules at base call cost”)

neonsunset · 2024-05-29T16:09:46.000000Z

The cause of this is not a technical limitation of "this is not possible" but rather "Go does this poorly due to tradeoffs* it insists on".

C# trivially pins only select buffers or other data that need to be observed by whatever it is being called across FFI while GC is free to move and handle everything else (or, should the data need to be marshalled, it can be stack-allocated and pointer to such is passed instead, or it can be malloc and freed just like you can do in C). Years of work in this area made the impact of FFI as minimal as possible, with extra levers available to the user to completely erase the overhead (suppressing GC frame transition, using methods to allocate objects on the dedicated pinned heap, static linking for aot binaries, etc.).

Go's GC implementation is comparatively simple and has other inefficiencies such as very expensive write barriers.

*sometimes the tradeoff is just a product, which Go is, that was cheaper to produce, despite common beliefs

raggi · 2024-05-29T16:18:05.000000Z

Yup, and far too much spinning going on in the runtime too.

neonsunset · 2024-05-29T15:12:28.000000Z

Related: https://news.ycombinator.com/item?id=40435220

(C# can handle millions of active tasks, and so can Elixir, though with greater overhead as it offers a different set of tradeoffs)

neonsunset · 2024-05-29T15:11:11.000000Z

If only there was a high-level GC-based language with fast FFI and good performance, surely the industry would pick it based on the merits and not simply what is hyped...

pjmlp · 2024-05-29T18:10:44.000000Z

Sadly its owners have tainted a couple of generations to avoid anything from them, and the way they manage their OS isn't helping to change the point of view.

That is how we end up using less capable alternatives.

badrequest · 2024-05-29T19:31:42.000000Z

It's funny because that's how I feel about Rust on this website.

a-french-anon · 2024-05-29T11:57:21.000000Z

[flagged]

ncruces · 2024-05-29T12:04:37.000000Z

So your point is that a Lisp that ships a JIT will help you implement a "copy-and-patch JIT" because "macros" and "code is data"?

Let me point out that Go neither ships a JIT, nor has macros, so I really don't see how that's relevant insight for the Go ecosystem.

a-french-anon · 2024-05-30T07:15:28.000000Z

CL compilers aren't really JIT, they're AOT with user access to the compiler; yes, I know the distinction is blurry in this specific case (https://news.ycombinator.com/item?id=24387664).

Macros and homoiconicity have absolutely nothing to do with my example. The fact is that go provides a compiler, so it should allow one to embed it in the final binary to use it at runtime.

pjmlp · 2024-05-29T12:30:33.000000Z

Go has poor man's macros via //go:generate, and having a JIT available that is an implementation detail, e.g. Yaegi could support a JIT if they feel like it.

jerf · 2024-05-29T13:20:54.000000Z

go:generate is not macro support. It is just the ability to specify a command to run in the source code when you run "go generate".

Having been around this loop multiple times now with people, here is the source code for go:generate: https://github.com/golang/go/blob/master/src/cmd/go/internal... I've highlighted the line where it executes the given command. This is not aimed at you specifically, pjmlp, but anyone in general who would like to vigorously assert go generate really is a macro facility is invited to point me at even a single line that involves such a thing in the source code. The only slightly useful thing it does is on line 366, where it will set some conceivably useful environment variables for the command it runs.

But otherwise, this is "go generate": You add:

    //go:generate echo hello!

and when you type "go generate" in the shell, go runs "echo hello!" for you. That's it. While you could conceivably use it to generate macros, you will be bringing 100% of the macro execution code to go generate; it provides no help.

The closest thing Go has to built-in macro support is that it does ship with full AST parsing, some code built on top of that, and it can re-emit code based on your changes.

(I don't target this at you pjmlp because I'm pretty sure from previous interactions you are not a Go programmer, and a disturbingly large portion of the Go community itself vigorously and emotionally believes that go generate is a macro generation facility, so how were you to know? People seem to have a very, very hard idea with a feature being labelled "generate" that in fact generates nothing.)

pjmlp · 2024-05-29T13:27:04.000000Z

No issues, that is also why I stated "poor man's", so it is even poorer than I imagined, :)

mseepgood · 2024-05-29T14:39:00.000000Z

It's not "poor man's", it's simply a completely different thing and as such not comparable and irrelevant to this topic. It's not run at runtime, it's not run at build time, it's run at development time before commit. It's like saying a house is a poor man's apple.

pjmlp · 2024-05-29T15:19:36.000000Z

[flagged]

jerf · 2024-05-29T15:38:35.000000Z

I do not have a second account.

I don't know what "it" is in that comment; I fear "it" is still some sort of macro capability, which doesn't exist, at dev time, compile time, run time, or otherwise. Like I said, the idea that "go generate" is some sort of macro language other than a thin wrapper around a command to run is very common.

I suppose it would be interesting to try to get people to describe this macro language that supposedly exists. What's the macro look like to transparently wrap a string in a print statement for debugging?

ncruces · 2024-05-29T21:52:39.000000Z

I guess the meta programming that most often gets used with //go:generate ends up being text/template.

The Go sort algorithm (pdqsort) is implemented in this way so the same algorithm can be used to implement the various existing sort APIs.

https://github.com/golang/go/blob/master/src/sort/gen_sort_v...

jerf · 2024-05-30T13:43:37.000000Z

100% of the work is done by the code that go:generate invokes, though. Literally 100%. go:generate brings nothing to the table, except like I said some env variables. You can simply directly invoke the thing go:generate invokes and it will work fine, because go:generate does nothing to help with code generation.

I've used go:generate once, and what my invocation did was extract a swagger file from the code. Because it doesn't matter what go:generate runs. You can make it so go:generate will run a shutdown command, or deploy your code, or start up Skyrim. 'Tis a strange "macro language" that will start Skyrim exactly as happily as it will "macro". And I pulled it out because it was easier to just run the code directly.

pjmlp · 2024-05-29T15:58:22.000000Z

No worries, I knew that, it was more a jab to those kind of remarks that don't help anyone. Sorry if you got annoyed with your reference.

ncruces · 2024-05-29T12:35:33.000000Z

I know. But you're not running //go:generate at runtime, and for Yaegi to gain a JIT... well, they'd have to implement much of what's in the blog. That's the point.

Mind you, I wouldn't mind having a something that translates Wasm to Go at compile time, and ship Go instead, for the cases where the Wasm doesn't change.

But people also load Wasm at runtime, and for that to work, you need the (Wasm) compiler available at runtime, and there just aren't that many such compilers for the Go ecosystem.