How Go uses Go to build itself

acqq · on June 4, 2013

The last time I've looked there were a lot of C code needed to be compiled with C to build Go, and definitely it was not for the bootstrapping purposes as the part of standard libraries were written in C. Is it still the case?

andrewreds · on June 4, 2013

Most of the standard library is written in go.

a large part of the runtime package (within the standard library) is written in c and asm. This c code compiled with one of the 5c, 6c or 8c compilers.

gcc is only used for bootstrapping (I believe... tho I am still trying to get my head around what happens).

DavidWanjiru · on June 4, 2013

I'm the naive one here, but is "Go using Go to build itself" much like what Paul Graham talks about re LISP?

swdunlop · on June 4, 2013

In a much more complex sense, yes. Go's build process starts from various cruft found in Linux, OSX, or Windows, and makes a beeline for its own (more uniform) cruft, where things are uniform enough to build Go's toolchain.

It is interesting to me, primarily, because as a Go user, I never have to see it. The toolchain's abstractions are good enough that, as long as I don't meddle with the syscall package or keep my CGO interactions shallow and confined, I really don't worry about crossing between developing on OS X and testing and deploying on Linux. (Or even crossing between amd64 and ARM, now that Go 1.1 is out.)

Even cross-compilation is very straightforwards, once again, assuming you stay the hell away from CGO.

pjmlp · on June 4, 2013

Most languages can be used to built themselves, in a process known as compiler bootstrap.

The main reason why some compiler developers don't do it is mostly a question of using existing tooling instead of redoing all required stuff in the new language.

Additionally porting to new platforms by bootstrapping usually requires cross-compilation, which for some developers might not be worth it, depending on the target audience of the language.

ralph · on June 4, 2013

The Go developers have said they deliberately didn't try and write Go in Go because they've done that with other languages, possibly Alef, I forget, and a bug can arise where the fix is obvious if it wasn't that the bug exists and would be triggered by the fix. Instead, a more awkward fix has to be figured out that doesn't trigger the bug. Of course, once the bug is fixed the original straightforward fix can be substituted but it's all unwanted hassle.

gizmo686 · on June 4, 2013

How often does this come up? Under normal circumstances, wouldn't you be able to revert to a version of the compiler from before the bug was introduced? If your still in initial development, then you have the old, foreign, compiler to fall back on.

ralph · on June 5, 2013

Seemingly often enough that it deterred them with Go. No, you may not be able to revert to before the bug was introduced as it may be the bug was there ever since that feature was added. As soon as it is self-hosted, the old compiler becomes quickly irrelevant, i.e. the code rapidly diverts from what it can compile.

pjmlp · on June 5, 2013

That is why bootstraping done properly is always done in stages.

You have a compiler that can only compile a specific subset and use that subset to write the real compiler. There are endless book examples how to do it.

Given who Go designers are, I think they don't have any issue keeping the C code around.

ralph · on June 5, 2013

That's not how the books say to do it, and you're right, given who created Go you'd think they'd know this stuff. :-)

Many generations of the compiler are created. Let's say the compiler-in-C is worked on until it compiles subset Gosub1 which is just enough to write compiler-in-Gosub1 that duplicates compiler-in-C's behaviour. From now, compiler-in-C atrophies. G-2 features are implemented in G-1's compiler, though nothing uses them yet. The compiler's source then uses these, making it G-2 source, only compilable by a G-2-grokking compiler.

Weeks later we have a G-40 where a bug is discovered, introduced in G-20. It wasn't in the compiler-in-C so that's not useful. Choices include fixing it at `head', which can sometimes be awkward as described earlier, or fixing the initial G-20 implementation and then rolling forward all changes from there assuming the fix doesn't break code that was depending on the errant behaviour.

pjmlp · on June 5, 2013

> That's not how the books say to do it, and you're right, given who created Go you'd think they'd know this stuff. :-)

Given that compiler design was one of my three main focus on my CS degree, I read a few books along the way. :)

> Many generations of the compiler are created. Let's say the compiler-in-C is worked on until it compiles subset Gosub1 which is just enough to write compiler-in-Gosub1 that duplicates compiler-in-C's behaviour. From now, compiler-in-C atrophies. G-2 features are implemented in G-1's compiler, though nothing uses them yet. The compiler's source then uses these, making it G-2 source, only compilable by a G-2-grokking compiler.

It is not required to do this so fine grained.

The first version of the primitive language can already be good enough to offer the minimal set of features to compile itself.

Afterwards the full language compiler gets implemented in this minimal version and used for everything else.

There aren't thousand versions of the compiler, you just need to be restrictive of what is used in the base compiler.

> Weeks later we have a G-40 where a bug is discovered, introduced in G-20. It wasn't in the compiler-in-C so that's not useful. Choices include fixing it at `head', which can sometimes be awkward as described earlier, or fixing the initial G-20 implementation and then rolling forward all changes from there assuming the fix doesn't break code that was depending on the errant behaviour.

As I explained this is not required because you only have G-2 as starting point, which is able to compile whatever is the current version of the language.

Additionally you get the benefit to eat your own dog food and as compiler designer check if you are doing the right design decisions on how the language works.

ralph · on June 5, 2013

Sorry, I don't understand. You seem to be saying there's only two versions of the compiler, one written in a foreign language, e.g. C, the other in a subset, e.g. Gosub, called G-2. But then there's "whatever is the current version of the language", which suggests to me incremental improvements, e.g. the language develops as experience is gained rather than being fully planned on day one. So doesn't G-2 undergo changes to implement these? You may keep calling it G-2 but there are many (I never said thousands) of versions of it.

pjmlp · on June 5, 2013

Lets use Go as an example.

Now that Go 1.0 release exists and is stable. One could write a Go compiler using Go 1.0.

Eventually the compiler will reach a state that it can fully compile Go 1.0.

Now replace the C implementation of Go 1.0 by this new compiler and use it to write Go X.Y using only Go 1.0 features.

When the need to target a new OS or CPU arises, add a new backend that generates code for the desired target system in the Go 1.0 compiler.

Use the cross-compiler to compile itself with the new backend. Copy the binary to the new system, now use the Go 1.0 compiler to compile the Go X.Y version, whatever X and Y are.

You don't need to use multiple versions of the language and by keeping the feature set of base compiler small, it makes it easier to write cross-compilers.

ralph · on June 6, 2013

This is flawed AFAICS. It assumes that because 1.0 is fixed in specification that there are no bugs in the implementation. To return to my original point, these smart guys are on the record stating that's why they didn't do a self-hosting compiler; good enough for me. :-)

pjmlp · on June 6, 2013

How is this different than having bugs in the C compiler used for the language implementation?

ralph · on June 6, 2013

Yes, because of the stableness of the C compiler, bigger test audience, etc., compared to a new language under rapid development.

pjmlp · on June 6, 2013

Except that is a false assumption.

Have you ever done multiplaform C development across using OS vendor specific C compilers?

There are lots of nice bugs to be found, just check the available bug databases of any C compiler.

So this does not make it any better.

ralph · on June 6, 2013

Yes, C across AIX, Suns, Silicon Graphics, whatever those HP ones were, and others. Platform differences were common, bugs rare because many had been before me and they could always be worked around; I didn't have to fix a C compiler. When writing a compiler the aim is to fix the compiler.

This isn't getting us anywhere. We disagree. I value the opinion of that lot given their many decades of experience. I used to have your opinion, based on textbooks. They've made a good point, one I can see has considered thought behind it.

pjmlp · on June 10, 2013

> I used to have your opinion, based on textbooks.

I do have compiler development experience, but alas as you say this is not getting us anywhere.

cgag · on June 4, 2013

What things about lisp in particular? I want to say no but I'm not sure I'm clear on what you're asking.