Mike says in his talk that LLVM started out with the goal of having an intermediate representation to run optimizations on. Hasn't this always been the case with compilers in recent times? I seem to recall similar things being said about the GNU Compiler Collection at a presentation many years ago.
If this is true, what is (was) the appeal of the LLVM project at the time of the project inception?
GCC’s passes, at least back then (don’t know about today) used numerous intermediate representations, many of which were simply dumps of their data internal structs that the following pass happened to be able to parse (usually by including the header of the previous pass.)
LLVM’s IR is the same “thing” all the way through the pipeline, and that thing is standardized separately from the compiler, so you can write tooling that processes it and expect it not to break with new LLVM versions.
That same “thing” also round trips to a text representation that looks like assembly. That means people can more easily talk about it, that programmers can use it to write test cases or even to to write code in it.
If you really wanted to, I imagine you could write a proprietary compiler that generated some serialisation of one of GCC's IRs (GENERIC or GIMPLE), and you could write a deserialiser for GCC that reads in that IR. Release the deserialiser under GPL, then you can legally use GCC as the backend for your proprietary compiler.
This sort of IPC-like hackery often seems to happen when someone is looking to work around copyleft - they can then break the spirit of the GPL without breaking the legal obligations.
That's not too far from how the the first LLVM frontend worked, IIUC. llvm-gcc was a GCC fork (pre-Clang) that produced LLVM bitcode or IR. That IR would be fed into an LLVM backend for code generation.
You would still need to use gcc code to operate on the deserialized IR, which is GPL. Unless you then transform the IR into a proprietary one, in which case that's not breaking the spirit of the GPL.
At the time, I had a DSL I had written in C++, and I wanted to produce proper compiled binaries. I had started looking at the now defunct TenDRA - I was thinking it would be nice to compile to a "machine-independent binary" which could be converted to machine code on the actual runtime system. But soon after I found LLVM, and it seemed like a no-brainer comparatively.
I seem to remember that "machine-independent binaries" was Apple's first use of LLVM: distributing LLVM IR and having it be converted to machine code on the user's computer, back when they were supporting Motorola and Intel chips. And I think consequently that's how LLVM got a lot of its momentum.
> Mike says in his talk that LLVM started out with the goal of having an intermediate representation to run optimizations on. Hasn't this always been the case with compilers in recent times?
Not just recent times. The dragon book from 1986 covers IRs.
> Hasn't this always been the case with compilers in recent times?
Yes. LLVM used SSA form earlier than GCC did, and it makes its intermediate representation more accessible than GCC or other compilers. But the general idea of having a midend level IR for optimizations was not at all novel at the time that LLVM appeared. I think the point was mainly to emphasize the fact that LLVM IR allows more optimizations than the somewhat comparable IR of JVM bytecode.
I attended this talk, and it was one of the highlights of the conference for me. Mike is a great speaker and I found myself inspired to dive deeper into LLVM internals afterwards.
https://www.youtube.com/watch?v=VKIv_Bkp4pk