"Tell me about writing a compiler" There's no way to interview for what you're l...

kazinator · on June 22, 2015

Compilers aren't real-time, distributed, multi-user, ... basically anything that is hard from a software engineering perspective which includes deployment and testing. Compilers have reproducible test cases (if they don't, that is a bug in itself). You can catch regressions in a compiler with an automated test suite, and bisect their version history to see where it was introduced. Bug reports from the field tend to be easy to confirm. A compiler isn't going to work well when it has 10,000 users, but mysteriously spiral to a crash when there are 10,100.

Compilers have been developed by lone developers in isolation. Some compilers are only a few thousand lines of code.

Someone who can demonstrate knowledge of making a compiler could well be completely behind in their knowledge about language design. He or she shows how calculations are turned into a tree and then into code on a register machine (i.e. "for-mula tran-slation"). But no idea how to compile exception handling, closures, and doing advanced things with type (beyond just checks) and such.

That is to say, the bar for what can be legitimately called a compiler is quite low.

On the topic of languages, I'd rather interview someone who knows about a lot of modern language features and how they can contribute to the improvement of program organization, yet is foggy about the details of how some of them are might be compiled to machine code.

There is still "compiler worship". On a previous job, I fixed a code generation bug in gcc affecting MIPS targets. Everyone was talking about that: "like wow, he fixed gcc".

jfoutz · on June 22, 2015

If the point is looking for organizational skills, i think compilers are legitimate. They're big complicated beasts. Sure, some people in isolation can make amazing things with a few thousand lines of code - I think most people would be happy employing Fabrice Bellard.

Writing a C compiler seems like one of those standard undergrad activities. If a programmer can make that work, they probably have sufficient organizational skills. It seems like real-time, distributed, multi user app development is a separate filter.

I guess it's the distinction between looking for someone who is capable of solving those problems vs looking for someone who has solved those problems.

dpark · on June 22, 2015

"Tell me about writing a compiler" seems mostly to be a question intended to restrict your hires to fresh grads who took a compiler course or if you're trying to poach compiler writers from another company. You're not likely to get a very useful response from someone who hasn't written one recently or at all. You could just as well ask "Tell me about writing a POSIX shell", or "Tell me about writing a browser", or basically any "Tell me about writing a <FOO>". It's not a useful question unless you're specifically hiring them to write a <FOO>.

amelius · on June 22, 2015

Agreed. Nowadays, writing the runtime environment including a concurrent garbage collector is more complicated than writing the compiler.

juliangregorian · on June 22, 2015

When you say concurrent, do you mean "works with threads" or "doesn't pause"?

kazinator · on June 22, 2015

It's both. If it doesn't work with threads, there is no concurrency.

If the threads have to stop, it's still concurrency in the sense that threads are stopped in arbitrary places (just like an interrupt can stop a task at any point in its execution and dispatch another task or whatever).

But the garbage collector isn't running concurrently with the threads.

I.e. I think the term "concurrent garbage collector" is understood to be in contrast with "stop-the-world garbage collector" not with "synchronous garbage collector as a library function in a single-threaded virtual machine".

juliangregorian · on June 22, 2015

Forgive my ignorance, but are there any extant examples of garbage collectors (real GC, not reference counting) that don't stop the world?

nostrademons · on June 23, 2015

So when you say "real GC, not reference counting", you have to understand that tracing GC and reference counting are duals, and every concurrent collector is going to be some hybrid of the two [1].

But there are a number of algorithms that most people would probably consider "real GC" that don't stop the world. Modern Java GCs use the train algorithm [2], which bounds pause times to some arbitrarily-sized constant. Erlang per-process heaps [3] also give good soft-real-time characteristics.

It is theoretically possible to run the whole GC in a background thread (eg. while a GUI is idle). Both the JVM and the CLR have options for this [4][5], but they're only useful for workloads with large pause times (eg. GUI apps, lightly-loaded servers) and so aren't enabled by default.

[1] http://researcher.watson.ibm.com/researcher/files/us-bacon/B...

[2] http://www.ssw.uni-linz.ac.at/General/Staff/TW/Wuerthinger05...

[3] http://prog21.dadgum.com/16.html

[4] http://www.oracle.com/technetwork/java/javase/gc-tuning-6-14...

[5] https://msdn.microsoft.com/en-us/library/ee787088(v=vs.110)....

pavlov · on June 22, 2015

In typical software today, much of the complexity arises from interactivity. There are user actions, network events and all sorts of contingencies that make the program flow non-deterministic.

A compiler doesn't have that. It's blissfully old-fashioned, really: read files in, write files out -- no interruptions or real-time requirements. You could do compilers on punch cards.

In a way, the compiler is the most complex piece of software that you could imagine in 1960... But software has moved on and faces a different kind of complexity.

Hence I'm not sure if asking about compiler architecture is really representative of the organizational skills needed by today's software engineering.

barrkel · on June 22, 2015

I used to work on the Delphi compiler, for 6 years or so. That compiler was not a simple function of file input to file output. It lived in an IDE. It had to do incremental compilation, throwing away the compiled version of an in-memory file and recompiling it, but avoid recompiling the whole world of downstream dependencies if not necessary. It had to provide code completion, which meant parsing and type analysis of a file, but skipping codegen, with reasonably sophisticated error recovery. It had to provide the debug expression evaluator, which lived intermixed with the constant evaluator, and needed to know how to set up calls to symbols in the program being debugged. And it had to do all of these things in a long-lived process, with no memory leaks and high reliability.

The degree of asynchronicity is low, I agree. The most significant was just interrupting compilation on user request. But the complexity was not trivial.

These days I work at a startup on a full stack SaaS; C++, Java, Rails, and Coffeescript with as few blocking browser actions as possible. It is far less complex than compiler work, but it pays better. I've never had to debug OS code to diagnose runtime library issues in this job; nor write manual structured exception handling. I haven't had to try and build stack frames dynamically to implement reflection calls, nor unpick stack frames dynamically to implement virtual method interception. The relative complexity of potential races with SQL read-committed, or writing UI such that it can't become inconsistent, really doesn't compare.

jfoutz · on June 22, 2015

Compilers are a nice example because they require a lot of code, and all the code has to be right. If you can't impose enough organization there, you don't have a chance at getting a distributed event driven system to work.

qznc · on June 22, 2015

Compilers are simple. They are organized as a pipeline. Frontend, middleend, backend. In more detail: lexer, parser, semantic analysis, optimzation 1 to n, instruction selection, scheduling, register allocation, assembly.

Ok, you can do more complex. I work on a compiler, which does it lazily. Transformations are per file and depend on each other. If something goes wrong, you print out the graph to figure out the problem. No idea what the benefits could be. Might actually be a good example, where it could have been simpler.

jfoutz · on June 22, 2015

Yes, but there are a bunch of tradeoffs. do you try to lex the floating point format, or push that back to the parser? Do in introduce an extra pass for constant folding or do you try to shoehorn it into AST generation in the parser?

The nice thing about compilers, it requires a ton of code.

It's the kind of thing that any programmer can do (well good ones), and there are organizational tradeoffs to be made. There is no "right" answer, it's an opportunity to talk about organization. There are blind alleys that seem like a good idea, but turn out not to be. There are opportunities to merge and split passes. Some layers might want information from other layers, some layers might look very similar to other layers.