Hacker News new | past | comments | ask | show | jobs | submit login
Advanced Compilers: Self-Guided Online Course (cornell.edu)
827 points by matt_d on Dec 11, 2020 | hide | past | favorite | 232 comments



I’ve worked on multiple compilers (optimizations expert) at MSFT on VS and CUDA and gave developed a DSL and worked on Database Compilers. I can’t hire compiler people with right skills. We’re building an IDE and those parsers are written differently, and we use Scala packrat parser combinators. These courses teach very little judgement or industry relevant stuff. When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this. I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.


> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.

I don't teach compiler courses, but I'm an academic, and I do keep in touch with industry people to find out what's important for my data analysis course. A couple of big problems with your proposal:

1. Time variation in "what industry wants". This year one thing's hot, the next year it's something else. A bare minimum requirement for any course is that a student taking it in their second year should learn something that's still relevant three years later when they're in the first year of their job.

2. Cross sectional variation. There's no such thing as "what industry wants". Every place wants something different, with only a small subset common to 80% of employers.


My two cents - what’s useful for employers is giving students the confidence to jump into the code and figure it out. By this measure, what’s relevant for a compilers course isn’t the material itself. It’s that it’s structured to force the students to figure certain things for themselves.

There are college hires who need to be spoon fed, and college hires who just need mentorship and direction. Group two are invaluable.


Andy Pavlo of the CMU DB fame echoed this in one of the intro video lectures to one of his courses. He said his contacts in the industry valued people, who can navigate a large codebase productively. Most uni courses seem to involve writing a "toy" solution from scratch, which is seldom the case in professional programming.

He designed the course to be an exercise in working in an existing codebase that one needs to understand before adding features.

I guess that is one of the many reasons CMU grads kick open the door into many companies.


I am currently in this class and can confirm that navigating the codebase is a nightmare. The specifications we are given in the relevant assignments are vague or wrong and the code itself is pretty questionable. I spend more time trying to understand the assignment and current code than actually writing anything myself. If that's what he's going for, I guess he succeeded.


> the code itself is pretty questionable

Sounds like you got yourself a high quality code base!

It's normal for a code base to be over a decade old, during which time it was worked on by a team of a dozen positions, each of which turned over on average every 3 years. So now you've had a few dozen people, each with their own ideas of what's a clear way to express a problem, trying to solve evolving business requirements under tight deadlines. And there's a 50/50 chance the project originally started as a 'throwaway' proof of concept.


What strategies do you find most effective to understand what’s going on in the codebase then?

How informative is git log/blame?


> Most uni courses seem to involve writing a "toy" solution from scratch, which is seldom the case in professional programming.

Without writing a "toy" from scratch, one would never understand professional programming or codebases either. They wouldn't understand why they oftentimes poorly designed, full of bugs, etc. Tracing own "from scratch" steps, with self-reflection, is the best way to be prepared to "professional" programming. First, make and find mistakes yourself, only then you will be ready to find mistakes in, and improve on, other people's stuff.


>>Without writing a "toy" from scratch, one would never understand professional programming or codebases either. They wouldn't understand why they oftentimes poorly designed, full of bugs, etc. Tracing own "from scratch" steps, with self-reflection, is the best way to be prepared to "professional" programming.

I disagree. All the toy from scratch provides is a way to get the student to go from zero to poorly designed solution during the course, which he can just forget it ever existed as soon as the grade is in.

Meanwhile, you have no experience onboarding to an unfamiliar codebase, have to find a bug and fix it, implement a feature, or refactor it to reduce technical debt and improve it according to some criteria. And that's pretty much at least 3/4ths of a developer's work.


It’s a combination of a rigorous program, and very smart people as inputs.


ibains made a specific point about language processing in IDEs being different to that of (traditional) compilers. Presumably there exists a state-of-the-art there just as there's a state-of-the-art for conventional compilers, but academia doesn't give it as much attention.


Which is still countered by:

2. Cross sectional variation. There's no such thing as "what industry wants". Every place wants something different, with only a small subset common to 80% of employers.

No?


I don't think that really holds up. Ultimately you could say this about any small mismatch between desired skill-sets and those available, but at some point we call it hair-splitting.

If you need someone to build a parser for your IDE, you're presumably better off with an expert in traditional compilers, than with someone who knows nothing at all of language processing. You'd be better off still with someone experienced in language processing within IDEs. Less mismatch is better, even if it's impractical to insist on no mismatch at all.


Text -> parser -> AST -> job done. If it's any different in an IDE vs anything else I'd like to know how.


Your IDE parser will be unusable if it goes bananas while you're typing the characters needed to get from one fully, correctly parseable state to the next.

It needs to be able to handle:

    printf("hello");
and also:

    prin
and also:

    printf("He
It also needs to be able to autocomplete function signatures that exist below the current line being edited, so the parser can't simply bail out as soon as it reaches the first incomplete or incorrect line.


Isn't that pretty standard for a parser? When was the last time a compiler bailed on the very first error it hit and refused to do anything else?

The solution is to pick synchronisation points to start parsing again, i.e. ; at the end of a statement or } at the end of a block.


> When was the last time a compiler bailed on the very first error it hit and refused to do anything else?

Make still does this. (That's the "Stop." in the famous "* missing separator. Stop.") Many errors in Python still do this.

As late as 2010 I still saw some major C compilers do this.

99% of the toy compilers written for DSLs do this, or worse.

Good error recovery / line blaming is still an active field of development.


> Good error recovery / line blaming is still an active field of development.

True. But let's get terminology straight: that's not a compiler science, that's parsing science. And it's no more compiler science than parsing a natural language is.


What terminology are you talking about? Neither "compiler science" nor "parsing science" are terms I used, or that the industry or academia use.

Parsing - formal theory like taxonomies of grammars, and practical concerns like speed and error recovery - remain a core part of compiler design both inside and outside of academia.


How can you be sure that that } is the end of a certain defined block? This most importantly affects the scoping and in many cases it's ambiguous. IDEs do have rich metadata besides from the source code but then parsers should be aware of them.


You're ignoring the ; which are sync points.

> How can you be sure that that } is the end of a certain defined block

If it's not in a string, what else is it but a typo? If a typo, it fails to parse but so long as it doesn't crash, fine.


Maybe my wording is not accurate, imagine the following (not necessarily idiomatic) C code:

    int main() {
        int x;
        {
            int x[];
        // <-- caret here
        x += 42;
    }
This code doesn't compile, so the IDE tries to produce a partial AST. A naive approach will result in the first } matching with the second {, so `x += 42;` will cause a type error. But as noticable from the indentation, it is more believable that there was or will be } matching with the second { at the caret position and `x += 42;` refers to the outer scope.

Yes, of course parsers can account for the indentation in this case. But more generally this kind of parsing is sensitive to a series of edit sequences, not just the current code. This makes incremental parsing a much different problem from ordinary parsing, and also is likely why ibains and folks use packrat parsing (which can be easily made incremental).


Partial parse state and recovery are critical. You don't want the entire bottom half of a file to lose semantic analysis while the programmer figures out what to put before a closing ).


Does that issue go away if you use packrat vs some other means?


Packrat parsers are notably faster than recursive descent parsers (also critical for IDE use) and by turning them "inside out" (replacing their memoization with dynamic programming) you get a pika parser which has very good recovery ability.

There are varying techniques to improve error recovery for all forms of parsing but hacked-up recursive descent (certainly the most common kind of parser I still write for my hacked-up DSLs!) have poor error recovery unless you put in the work. Most LR parsers are also awful by default.

When I was in university most focus was on LL and LR parsers with no discussion of error recovery and more focus on memory usage/bounds than speed. I also have no idea how common it is to teach combinator-based parser grammers these days; stuff like ANTLR and yacc dominated during my studies. This would add another level of unfamiliarity for students going to work on a "real compiler".


> Packrat parsers are notably faster than recursive descent parsers

I think this needs to be qualified. I don't think a packrat parser is going to beat a deterministic top-down parser. Maybe the packrat parser will win if the recursive descent parser is backtracking.


True - the performance of a top-down parser is going to depend on if/how often it backtracks and how much lookahead it requires. But this requires control over your grammar which you might not have i.e. you are parsing a standardized programming language. Practically speaking, unless you want two separate parsing engines in your IDE, that Lisps are actually LL(1) and Python and Java are "almost LL(1)" doesn't get you closer to parsing C++.

Packrat parsers are equivalent to arbitrary lookahead in either LL or LR grammars.


u/morelisp may have meant faster to develop, not runtime performance.


Thanks for mentioning pika parsers. I've just had an enjoyable read of the pika parser paper.


Yes. Go and look up Tree Sitter and its references.


Parsing (use of rather than theory) matters as it affects my work. So I followed up.

See https://youtu.be/Jes3bD6P0To

Tree sitter is based on LR parsing (see 23:30 in above video) extended to GLR parsing (see 38:30).

I've had enough of fools on HN posting unverified crap to make themselves feel cool and knowledgeable (and don't kid yourself that you helped me find the right answer by posting the wrong one). Time I withdrew. Goodbye HN.


I'm not sure what you think you're refuting but Tree Sitter definitely does some different stuff to allow recoverable parsing.


There is no such variance and new fashions in compilers every year. There are hardware compilers and tools including language servers primarily. Move to cloud requires source to source compilers for data snd transforms.


> There is no such variance and new fashions in compilers every year.

Indeed. Packrat is exactly an example of passing fad in parsing. I find it telling that you, an "optimizations expert" are so worried about ways to parse. Shows that "new fashions every year" have reached the compiler industry too.


Hiring for exact skills is what you do when you need the skills ASAP and the actual supply of engineers is there. The time you waste looking for someone with the exact knowledge you need could have been spent getting a good compiler engineer and just teaching them what you know. And magically that good engineer will get better, it's pretty amazing stuff.

I've was told multiple times, University/School teaches you how to learn. You get your actual knowledge and skills from the job. What I learned in 3 years of university was nothing compared to what I learned in 6 months on the job.


At first, it sounds like "win/win" if universities trained their students to use the same tools companies want. Companies would save money for training, and graduates would be ready for their jobs.

But in longer term, it is actually "win/lose", because if three years later the fashionable tools change, the companies that optimize for saving money on training would simply fire their existing employees and hire new graduates.

For the students, it is better to be the kind of person who understands the deep principles and can learn new tools as needed.

And the companies have a choice to either offer job trainings, or offer higher salaries to people currently working in other companies who already have the needed skills. (Or whine about market shortage and import H1B servants.)


I will second this. I would much rather hire an engineer who has a related background and wants to learn over an engineer with the exact skillset but does not want to learn. It may take a few months, but the former engineer will pick up what you want them to know (if you mentor them correctly), and past that inversion point, they are immensely more valuable.


You are thinking about it the wrong way. In this niche, you will almost never find people with the exact skills you are looking for. Better to find people with tangentially related skills and experience and let them grow into the role. Academia is never going to be comprehensive enough to prepare students for every niche that has some demand. Rather, they should just focus on teaching a core that will help them learn what they need later on demand.

> When do you use packrat parser vs LALR vs LL?

If you need ok performance and good error recovery (especially if you are doing stuff in an IDE!), none of the above: recursive descent is not that hard to implement, and is incredibly flexible.


packrat parser combinators are recursive descent with infinite look ahead. That’s the problem, engineers are under educated


I agree. Engineers are under educated. This is not going to improve.

Poor and middle class kids don't have the luxury to soak up as much education as they feel like before choosing a topic they are interested in. That is a privilege reserved for the wealthy.

We go to schools that are pragmatic, teach us some basics, and the get us out the door to do cog work.

Look at all of the comments you have drawn. Let me spell it out for you: your expectations are too high.

Society is not designed to create compiler engineers you can hire fresh out of school. It is churning out bright scrappy people who are hungry for class mobility, a decent standard of living, and an interesting job to smash their brain against.

Maybe look for one of those instead of expecting a perfect candidate to drop out if the sky. You have a cool as hell project so I'm sure you could find a brilliant person willing to learn.


The problem is some engineers are ideological, you don’t want to work with those. If you have this argument during an interview and the interviewing engineer is insisted that X is always the right choice, run away.

Anytime a commercial language gets bogged down in parsing, unless that is core to the product (and mostly it isn’t), also run away. Parsing represents like 1% of the work that needs to be done on a compiler, even if it’s an IDE parser that needs to be really incremental (the only worse thing is to get stuck on lexing).


Do you think it is the responsibility of academia to teach "industry relevant stuff"? I agree that courses generally don't teach judgement well, but I do think that it is the workplace/industry's responsbility to train their engineers and not expect fresh grads to be fully equipped to work on something that specific.


> Do you think it is the responsibility of academia to teach "industry relevant stuff"?

I don't think that was what ibains was going for, though i don't fault you for seeing this in that comment. (Especially that i don't think this course suffers from that problem and generally i think things are rapidly improving on this front.)

That problem to me and i think to him too is that quite a lot of things that such courses tend to teach as well thought out, well working solutions and approaches just don't work very well and frequently i find comments on what's a 'good taste' solution and what isn't that are completely my understanding of the problem space which is why.

E.g. in parsing (as that's the topic he mentioned): First, lots of people just spend way too much time on it. And they focus on parts that are of zero use to beginners (like explaining all several grammar families) and then use obtuse parser generators that save no work and sometimes use them in bizarre way (like picking a lalr parser generator then hand editing the output to support a not-quite-lalr language). Meanwhile a recursive descent parser is easy to write, fast and gives pretty good error messages with _very_ little work. You do need to know enough about grammars to be able to write one down and know if it describes an ambiguous language etc so this should be taught, but you don't need to understand the language zoo well.


>I don't think that was what ibains was going for, though i don't fault you for seeing this in that comment

Isn't that literally exactly what he was going for?

"I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant."


You might believe academia should not impart knowledge students require for their subsequent jobs. Ask the students why they go to university.

See what is taught in UCBerkeley with Spark and all coming out of there. I took a systems course with Eric Brewer - totally amazing - the context I got.


I think the ideal is really that a student should be capable of going into the world and picking up a new technology within the same area without too much trouble — I may have used ANTLR for a class, but I should also understand it well enough to write my own parser, or start looking into other parsers.

But if the gap is such that knowing ANTLR doesn’t really help me use packrat parsers (haven’t used them myself, so I don’t know) then it’s a gap probably worth filling.

But ideally students should learn the fundamental architecture/algorithms, not the specific tools, popular algorithms & implementations, and all the accidental complexity that came with it.

Which is where the industry vs academia problem usually comes in — industry often only cares about knowledge of a particular tool, and only sometimes cares about knowledge-transferability (I think largely because HR exists, and operates independently of the actual engineer making the request, with the first pass review).

The company hires for Hadoop, the university teaches distributed systems, and ideally the student can be trained in Hadoop specific in short order.

India also offers an example of the extreme alternatives — heavily industry driven courses focusing on specific tools, and you end up with students having practical knowledge for current tooling... but no ability to transfer it. Like the kind of resource who refuses to program C# because they’re a Java developer, and it’s all they know


Most languages are context sensitive.

Most language tools are context free.

How did we go so wrong?


Languages being context sensitive results from hacking in new language features. It's a constant struggle to keep D context free :-/


Strictly speaking, D is not context free nor is any statically typed programming language. That said most languages, including D, have supersets that are context free and can be used to build an AST which can then be refined further with context sensitive checks (ie. variables are used after declaration) and then even further with type checks.

Many language tools don't need a full blown language parser and can get by with a context-free parser for the language's superset. Things like autocompletion, spell checking, documentation/highlighting can all be implemented using context-free parsers.


I suppose we could argue about the definition of context free, but the compiler lexes it without any reference to the parser, and the parser builds the AST without any reference to symbols or semantic meaning.

> spell checking ... using context-free parsers

Only up to a point. D's spell checker uses the symbols in scope as its "dictionary". This means it is very good at guessing what you meant instead of what is typed, as the dictionary is very targeted. I.e. it uses the context, and the spell checker is part of the semantic pass.


A context free grammar has a formal and rigorous definition. One way to see that Dlang does not have a strictly context free grammar for its AST is the following snippet of code:

A[B] C;

That is parsed differently depending on whether B is a constant integer expression in which case it parses to the declaration of an array of type A with a size of B, or whether B is a typename in which case it parses into an hash map with keys of type B and values of type A.

This disambiguation can not be specified by the context free grammar and instead must be delayed until the semantic analysis (which is usually defined by a context sensitive grammar).

Pretty much all statically typed languages work this way with various kinds of ambiguities. In practice, it's not a particularly big deal. But that's kind of my point, there's no need to work exceedingly hard to force a language construct to be context free, especially if making it context free results in a lack of readability or awkward syntax.

It's perfectly fine to specify a superset of the language that is context free, parse an AST for the superset of that language and then resolve AST ambiguities in later phases, such as the semantic analysis phase or the type checking phase. Almost any tool written to assist an IDE with autocompletion or other functionality will have no problem working with the language superset.


> That is parsed differently

It isn't parsed differently. Its semantic meaning depends on what B is, but not the parse.


You realize the poster you are responding to is the creator of D? And has spend his career writing compilers? I'm not sure he needs to have someone explain what a context free grammar is.


I understand you wish to flatter WalterBright, but please note that your comment is fairly toxic and not well suited to Hacker News. Please stay on topic and feel free to contribute to the points being made. This kind of hero worship often just degrades discussion and is better left to other forms of social media.


Yours is the toxic comment. Appeal to authority is fine if the authority is an authority and WB has proven his. Suggestions of intention to flatter and accusations of hero worship are pretty gross. @kingaillas's point is entirely valid IMO.


> Most languages are context sensitive. Most language tools are context free. How did we go so wrong?

We never did. We always knew that any real-world useful grammar is multi-level and attribute (well, it's constraint-based, though depending on your definition of "attribute grammar", those are equivalent). That's why we so much like recursive-descent parsers: adding any multi-level constraints are so easy to them - you have a full Turing-complete language at your disposal, unlike simplistic 1-dimensional DSLs of most parser generators.


Could this be viewed as supply exceeding demand?

Context-free grammars are ripe for theoretical computer science work even if they're not practically relevant. On the flipside, I suppose the constraints of context-free grammars are seen as a price not worth paying when designing a language.


Can you give an example (or three) where that lets us down? That would be very helpful to me I suspect.


Examples of which bit?

Languages that are context-sensitive? C, C++, Java, JavaScript.

Examples of tools based on the starting expectation that languages are context-free? Yacc, Bison, Jay.

Examples of the problems this causes? Well we're using the wrong tool for the job, right from the start. Instead of using an appropriate tool the first thing we do is bend our tools out of shape. We use side-effect actions to subvert the tool's model in an uncontrolled way. We don't get the full benefits of the tool's original model and can't rely on its guarantees.


> Do you think it is the responsibility of academia to teach "industry relevant stuff"?

In college I learned a lot of math and how to do things like calculate stress and bending. I learned nothing about material finishes, material selection, fastener selection, tolerances, drafting, etc. But the latter was easy to pick up on the job, while learning mathematical analysis on the job just doesn't happen.


And learning PHP, React, and MongoDB is easy to do on the job.

However, I think that there are industry practices which may inform what students are taught. For example, process algebra is rarely (if ever?) used in industry. There are other formalisms of comparable difficulty and academic interest which are used however, perhaps teach those instead?


I definitely think there ought to be courses on such things available. There should be an educational route that is between pure academia which focusses on esoterica that is only relevant to research and vocational courses that don't include much innthe way of theory at all. The idea that industry ought to pay for it presumably comes from a background where universities are private an expensive. But I'd like to live in a world where these kind of courses were government subsidised and available for free or cheap.


> The idea that industry ought to pay for it presumably comes from a background where universities are private an expensive.

In the US at least, where even public universities are ridiculously expensive, there's no way to keep the current system running unless our grads are able to get jobs that pay well when they graduate. Indirectly industry is paying for the courses taught by universities. In the current environment we need to prepare students for employment.

> I'd like to live in a world where these kind of courses were government subsidised and available for free or cheap.

I think we'd be better off as a country (US). I also think that ship has sailed and it sank to the bottom of ocean a couple decades ago.


I tend to think of most university education as taxpayer and student subsidized versions of the sort of training that companies should be doing. I think, for most programmers, a well-thought-out apprenticeship program would be better both for them and for the company, except that the amount of money we send to universities makes this economically infeasible.


> Do you think it is the responsibility of academia to teach "industry relevant stuff"?

I mean, there are colleges that cater specifically to industry. Digipen is a good example of this: they have very close relationships with game studios and shoehorn their curriculum into the requirements the state imposes on colleges.

These sorts of colleges are not particularly popular, however, so I think most prospective students understand the value of a more broad and timeless approach to software education.


I understand the problem.

I teach compilers at a big university. And I would love to hire graduates with compiler skills for my startup, but find it difficult.

There are several structural problems that conspire to keep students from acquiring knowledge in low-level programming domain like compilation: a course needs to fit with the rest of the curriculum, and with student interest. I cannot get students interested in lexing and parsing, because they hear all day how exciting deep learning and big data are. Lexing and parsing are deemed solved in the 1960s, a sentiment I disagree with. In addition, classical theoretical computer science (e.g. automata and languages) is rarely taught with enough depths in many universities, so students lack the technical background. You can cover this in a compilers course (I do) but it takes time, and the students see it as an ordeal. Compilers as a subject is not helped by the most widely recommended book being the Dragon Book which is dated and way too long to be useful for beginners. Compare the Dragon Book with the material available as introduction to machine learning ... Many of the existing textbooks are also not covering modern compilation themes, in particular JITs, and compilation for multi-core and GPUs. I'd say there is currently no good undergraduate textbook on compilers. I could probably, with a lot of work, cobble something reasonable together from my lecture notes, but I don't believe in books, I'd like to do a good MOOC, but building suitable software infrastructure requires more work than I am currently willing to put in.

My department tried hard to remove the compilers course, as "no longer relevant", and "too hard". I threatened to resign if it was not kept as a mandatory course. For now this worked.


I never took a compilers class during university study and it is my greatest regret. I've since been self learning and it hasn't been easy.

Thanks for advocating for the importance of compilers. This lack of knowledge has been a limiter for me many times. I'd really like to fill a skill gap and be able to write a DSL or a little language to solve a problem.


From my experience in industry, while hardly any programmer will ever need to write a modern code generator, few things are more useful in practise than being able quickly to design a suitable DSL and implement a corresponding performant and correct lexer and parser. This is not something that can easily be picked up on the side.

In addition, understanding what happens under-the-hood when you program in a high-level language is an eye-opening experience for many students. This sentiment comes up in student evaluations year-in, year-out.


I never understood this. I took the compilers class at CMU, and I loved it. I went on to take a follow-up class implementing a compiler with higher-order types. Meanwhile, most of my classmates avoided compilers altogether.

I've worked in the industry since undergrad, and I've implemented several interpreters for DSLs. Sometimes I was asked to make it, but other times I had the flexibility to dream up a DSL on my own, and it ended up becoming indispensable to users. I've always loved building them, but to this day, parsing is still the most boring part to me.

I recently went through Bob Nystrom's Crafting Interpreters, and it was great fun. I implemented an interpreter in Rust. Material about compiling for GPUs would be interesting. But for me, personally, I really wish there were something updated for how IDEs use languages nowadays, in particular, walking through a good way to implement the language server protocol.

The calculator language that every compiler book starts with should be LSP-first.

A MOOC would be great, but I'd also be happy with a series of blog posts.

Does anyone know of good material bridging the gap between traditional compiler architecture and the LSP?


> Compilers as a subject is not helped by the most widely recommended book being the Dragon Book

Is it? Even I, an outsider [to the modern university courses], see that the baseline quite shifted to the TigerBook (aka Appel - Modern Compiler Implementation in [your choice]).


Is it at all possible that traditionally compilation is taught back to front? I am just a hobbyist, my CS degree did not include anything about compilation other than the theory of types of Grammar. I am the first to put my hand up and say I do not really know what I am talking about.

For me, compilers are interesting because of a. optimisations; at my level simple logic that is obviously advantageous to a CFG. b. interaction with the dependencies and operation of low level hardware; how does it REALLY work.

In the hobbyist pursuit of an understanding I have become belligerent to parsing. Parsing is interesting. But also not really. I would have benefited substantially from an abstract introduction via graphs, analysis and lowering. Both in terms of interest and fundamental understanding.


Thanks. I'm actually very interested in lexing and parsing, because that's probably 99% of the stuff a layman-programmer gets to do with compiler theory in job. I mean not everyone gets chance to write the backend part, but parsing skill is almost used universally.


A fun exercise is writing a lexer generator or a parser generator. It's probably easier starting out with writing a lexer generator like Flex for your favourite language. I recommend using the Flex input format, than you can test your creation using Flex as a testing oracle! This is not something you can do in a weekend.


Thanks mafribe, looking forward to your future book (if there is one) :D


Your HN profile is empty, so there seems to be no good way to check out your institution's program or to contact you.


No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.

As this is a PhD course, I'd expect the goal of it to be preparing students for research in the field, not writing parsers for some industrial company.

Also, there are definitely courses that teach you how to write a parser. But we usually don't call them advanced.


> No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.

Not if you're writing an IDE. Writing a batch mode parser that takes an entire correct source file and produces an AST is easy. Writing a parser that instaneously handle producing an updating AST while the user is typing in the middle of the program while still allowing semantic analysis to occur and without vomiting out a pile of extraneous errors or giving up entirely is a lot harder.


> Not if you're writing an IDE.

Phew, how boring! I know compiler people who (apparently) spend their time at dayjobs writing IDEs, then get back home and write books on ol' good "batch mode" processing. Can't wait for them to add a JIT chapter. But imagining they would work instead on "writing IDE" chapter would be rather funny ;-).


Heh. :)

You could consider this an observation that writing a batch mode parser is so much easier that I'm comfortable writing a book about it, but I still don't know enough to write about IDE-grade parsers and analyzers.


> Not if you're writing an IDE.

This thread is so weird. The topic at hand is a compliers course. ibains complains about a desire for qualified IDE developers. It's like complaining that your family doctor isn't an orthopedic surgeon. Why should a compilers course spend the requisite time for these highly specialized IDE algorithms?

Specifically. What part of this course should be removed to make room for IDE-specific algorithms? Or, would an advanced IDE algorithms course (with this course as a prerequisite) be a more sensible approach?


He didnt say easy, he did say boring, and I agree.


I wouldn't describe it as boring either, but maybe that's just me.


I saw this great interview recently with Anders Hejlsberg at MSFT on how modern compiler construction differs from what is taught in traditional University courses. Is this what you're alluding to?

After watching that interview, it's strange to read a comment like yours. While the architecture is completely different, it doesn't frankly seem like that big a leap to go from "senior engineer with X years working on thing that's tangentially related to compilers" to being able to be productive in a reasonable amount of time working with the new architecture. What am I missing?

https://youtu.be/wSdV1M7n4gQ


I will say that I asked Shriram Krishnamurti about why books on interpreters and compilers tend to avoid the question of parsing like the plague and found his response a little unsatisfactory.

All these celebrated texts like SICP, EOPL, and Lisp in Small Pieces avoid the question of parsing altogether by focusing on s-expression based languages.

His response was effectively that parsing is over-emphasized in other books, and it's truly orthogonal to other PL activities. Which I can understand, but in practice when I need a parser in my day job I usually don't have much say in the structure of the input "language" (since I'm usually parsing something coming from another source that I have little control over). And it would seem if you have an instructor in a compilers course with this point of view, what other class would give you an opportunity to learn parsing in a formal setting?


A full half of the compilers course I took in undergrad was about {LA,S,}LR; I think "parsing is over-emphasized" is alive and well... That said, it didn't cover non-table-based parsing much. (On the other hand, I'm unaware of a non-table-based parsing algorithm that can statically warn you about problems with the grammar (ambiguities etc).)


Hey, I know Shriram! We were students at Rice together. Not saying we knew each other well, more like passing acquaintances, but I recognize his name.

Anyway, as I recall from compilers classes (it has been ~30 years), the Dragon book heavily emphasized parsing, but the professor said "I'll assign you a parse to do by hand for homework, but and that's it".

The overall attitude was a combo of 1) parsing theory is also covered in another theory of computation class (where the classic textbook was at the time "Introduction to Automata Theory, Languages and Computation" by Hopcroft and Ullman, but these days may also be Sipser's book; 2) there were so many other topics to cover in compilers that a week or so of parsing was all the time budget allowed; and 3) the research focus of the professor's work was in other areas, and compiler research in general was the same.

So not enough attention to parsing might be a side effect (perhaps unfortunate). Even if there is enough material to have an entire class on parsing.

I took the advanced course in compilers too, which was more like a topics class - a research paper every week or so. I don't remember anything about parsing there either.


> why books on interpreters and compilers tend to avoid the question of parsing like the plague

They avoid it like old boring anecdote everyone read yet in their childhood in a dragonbook.

> parsing is over-emphasized in other books, and it's truly orthogonal to other PL activities.

Right, "parsing science" != "compiler science". Parsing science isn't even a subset of compiler science. It's just another discipline which touches (someone may even say overlaps) with it. Consider parsing a natural language. (Which btw any human does, but compiler stuff is surely not for a every human.)


It's pretty hard to pick up compiler skills because very little of it is written down. It takes a lot of time (a few years) working with code bases and papers to absorb it.


> very little of it is written down

> and papers to absorb it.

I smell a contradiction. But I'm glad that even well-known industry people see it like that. It's not a common flu you can pick up at university library. It's an arcane lore you need to travel faraway to dig into ruins of Library of Alexandria to find the knowledge of.


I don't see the contradiction - very little of it is written down, and so you need to spend a lot of time scraping around for the disparate parts that are written down and trying to absorb as much as possible for them. There are very few (in some parts of the field none) books that bring together all the information.


Good, good. Reminds me that stealing your https://rubybib.org/ idea for Python is on my todo list for long time ...

Actually, my own ref to the Library of Alexandria, initially done for a word of mouth, is a literal situation in some parts of the realm. Reminds me I wanted to put up a mirror of Rice' Massively Scalar Compiler Project materials somewhere. In my mental map, Cooper and guys literally saved SSA from IBM's freezing hell (quoting Kenneth Zadeck: "What Happened Next / We stopped working on SSA. / None of us actually worked on a compiler project. / We were blocked from transfering SSA to the IBM product compilers.").

Really should do that on winter holidays. By that time, we'll also know if http://ssabook.gforge.inria.fr/latest/ actually have gone down. But at least that is already mirrored...


Yes, more validation - compilers as taught in universities need improvement.

So, a startup of 10 people should hire and train people for months. Raise a $2M seed round, have 12-18 months to show ProductMarket fit and as a founder teach all engineers their jobs? How many engineers go into compilers after school? So university education is all they know. The industry is built on qualified engineers. NVIDIA ceo was ahead of curve on graphics card, google founders ahead in search, why are compilers ones behind the curve?


>> Why are compilers ones behind the curve?

I suspect it’s because there aren’t a lot of high quality libraries you can integrate into the backend of compiler tools that don’t run into license issues pretty fast. Imagine if GNU binutils was more permissively licensed and as modular as clang? Then developing novel, non-GPL’d compiler infrastructure could depend on BFD - the boring part that working on won’t bring bonafide improvements to your new compiler. Another factor is that LLVM’s quality and ubiquity has reduced the monetary and technical upside to pursuing new opportunities in compiler development.


Huh, he's talking a lot about how doing everything incrementally is infeasible; isn't this how Rust's compiler/tooling infrastructure works?


Parsing is frontend of a compiler. That the compiler course focuses on compiler backend does not mean it's not relevant to the industry. I personally find aliasing, flow analysis, and and optimization very relevant to what we needed to do.

As for industry vs academia, wouldn't it be fair to say that professors should teach which ever is more advanced? I certainly don't believe that teaching Java generics, no matter how relevant that is, is relevant to a PhD program, which is supposed to push the state of the art of a subject.


> Good luck hiring an engineer with advanced compiler courses knowing any of this.

Have you tried raising the level of compensation you offer? There are engineers elsewhere doing this at other companies you could presumably get if you put your hands in your pockets?


“We can’t hire for x” almost always means “we can’t hire for x for what we’re willing to pay”


I'm normally fairly laissez-faire about the morality of "exploiting" employees, but that works both ways - if you can't find people for the amount of money you're charging then tough shit


Yeah, we can argue about the merits of capitalism vs socialism, but what some companies want is "socialism for us, capitalism for our employees".


We pay very very well, all my peers from NVIDIA are above 700k. Why would you presume anything about salary?


It's not true that there aren't engineers with these skills - there are thousands of them in the US and more worldwide - so if you can't hire them then either:

* you aren't looking (seems unlikely)

* there's something toxic about your team that means they'd never work for you and you should fix that (hopefully unlikely)

* you aren't paying enough to make the move worthwhile for them (seems likely as people never want to pay the going rate)

That's why I presumed salary. Double your salary and you may suddenly find you're being flooded with contacts from the compiler engineers at places like Google. If that's the case then your only problem was the compensation you were offering and not a problem with education.


Consider making that more widely known.

Until reading this, it would never have occurred to me that a compiler tech at Nvidia would get anything remotely like that figure.

To be honest, as someone interested in compiler tech roles, it never occurred to me to even consider Nvidia until reading this discussion.



> We pay very very well, all my peers from NVIDIA are above 700k.

That's... an impressive number.

> Why would you presume anything about salary?

Because that's usually the reason, especially when speaking of what universities teach. "We can't find people who know $TECH" is, under those circumstances, almost always corp-speak for "We can't find new grads who already know $TECH to fill our entry level positions for what we're willing to pay in order to replace the employees that learned it on the job and then leave when we don't give them raises after a couple of years."

Have you posted the positions you're trying to fill here on the monthly "Who is Hiring?" thread?

As an out of the box suggestion, consider sponsoring the creation of an online course (like a Udacity nanodegree, perhaps) that covers the curriculum material you consider missing. A comprehensive set of 'build your own programming language and tooling' courses would probably be quite popular.


Are you hiring massive programming language design & implementation enthusiasts familiar with the whole stack (parsing, type systems, verification, compilation, runtime, GC, ...) including all kinds of different parsing algorithms?


Please don't... your hobby proglingo... will go unmaintained... :-D


College is for imparting foundational knowledge. It isn't vocational training. You should find capable employees and train them for your needs.


> These courses teach very little judgement or industry relevant stuff.

Hasn't that been the debate for, like, ever?

Should the universities teach you how to learn or should they teach you how to hit the ground running at your first post-graduation job?

I'd say anyone who took phd-level courses on compiler design shouldn't take too much training to become a valuable team member as they've proven they can learn what is important for the given task but maybe that's just my uninformed opinion.


I feel like "teach you to learn" vs "teach you useful skills"is a false dichotomy, and often just an excuse for poor teaching. They can and should do both. The reasom they don't is because they're so research focussed, which is fine as an option but a problem because there are no or vert few good options for high level academic courses that are more practically focussed.


> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.

Courses should be used to teach general concepts while industry should be used to show those how to apply those concepts. The choice between using a packrat parser, LALR, or LL would be easier to explain to someone who has been exposed to those concepts and understands the advantages and disadvantages of each approach compared to someone who has never heard of the terms.

Best practices change with time, but knowing the base concepts and how to apply them will help people adapt. If they're only exposed to what's considered relevant at this time, then they won't have the necessary exposure to concepts that may pertain to future best practice.


> I can’t hire compiler people with right skills.

You can hire bright, motivated and mature people and teach them the skills they need. Academia doesn't exist to provide profit-maximizing, ready inputs to business. They produce people who have the skills to (hopefully) be able to grow into any position.


I have some idea what I'm talking about, see my post https://news.ycombinator.com/item?id=25210523 and I don't kniow about why using a packrat over anything else. I pull something off the shelf, antlr for my latest project, then get on with it.

The parser is the most trivial side of things, if you consider that a key skill it comes across as being an issue from the 1970s, not today. Writing a RD by hand is not a big deal anyway as someone else said.

If your into optimising compilers you're focusing on a very odd area to criticise. I'd gave though more about dataflow using lattices or whatever, graph stuff etc.

So parsing aside, what are the 'relevant' skills you feel are missing?

(and what are 'Database Compilers'? Nearest I can think of that may fit are query optimisers)


What ressources do you recommend I read, or what I can do, to learn more about applicable compiler knowledge?


Prev post by me, if it's any help https://news.ycombinator.com/item?id=25210523


Any resources you can suggest to point interested engineers in the right direction?


Unless you're looking for an external consultant for the one month project, you're doing it wrong: https://www.reddit.com/r/ProgrammerHumor/comments/4k994j/if_...


As someone who is happy when he interviews a candidate who at least knows you don't start parsing with `string.split()` I gotta say, your bar is extremely high. Surely MSFT can afford some internal training?


> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.

I'm sorry if you've already answered, or if it comes up a lot, but could you toss up some buzzwords/more industry-standard things you're looking for? I was going to look into this course, and still probably will, but if you're saying it's not relevant I'd like to know what to be looking into instead?


Parser combinators allow elegant implementations of parsing but are ridiculously slow compared to some Knuth-optimized parsers.


As elegant as parser combinators are, it is not so easy to work out exactly what grammar they are implementing.


packrat parsers are fast (at expense of some memory)


I am currently writing a parser using combinator. How can I reach you to learn what is the proper method to use on each case?


perhaps you should guest lecture at a university and have them record all the lessons? your kind of experience is rare.


Those mean academics! How dare they teach anything else than what you need right now.


I second this (although I work more on translation and modeling than optimization). IME, it's much easier to find otherwise competent, savvy researchers and spin them onto compiler work than it is to take someone who's gone through an "advanced" course.


Is that such a critical hiring criteria? That seems like something I could learn in a week.


> That seems like something I could learn in a week.

So why not learn it in a week then apply for the job?

I would suggest the reason is... because you cannot learn it in a week.


>So why not learn it in a week then apply for the job?

For a lot of reasons. I'm not looking to work in compilers. OP sounds like the opposite of the type of people I want to work with. I enjoy what I work on quite a bit. I am paid well. I have the chance to build up the business and a team.

It is just an attitude that I see a lot. The candidate doesn't have XYZ specific knowledge. Find someone who is interested and motivated (e.g. someone who has independently taken an advanced course in a topic) and build them up.

I'm sorry that some candidate can't answer your trivia question, but if it could be taught in a 16 week course (~10 hours / week), you could find a way to convey it on the job (~40 hours / week) in a much shorter period of time.

For this scenario have new hires work through a series of designed problems. Have them implement and run into the pitfalls with LL, LR, and Packrat parsers. Show issues with left recursion, shift / reduce conflicts, space trade offs of Packrat, seriously whatever you want to demonstrate.

Even just write up a document explaining when to use each rather than bemoaning the lack of knowledge. Like how much cooler would the top comment in this thread have been if I had learnt something?

This feels a lot like "I want a candidate with the knowledge of a senior for the price of a recent grad".


You can't gain experience in a week but you aren't working alone.

This is an example of know-how, not theory. Different problems require different things.


This would lead to incredible toxic situations. If I want to apply to 5 companies and all 5 have 1, but different questions that I could learn in a week. Would you considser it would be more efficient for any party for me to learn this in advance?

Also, this assumes it is known upfront what this required knowledge would be...


This is such an empty criticism. Universities are not job training facilities. You want people who have the exactly right skillset? Hire smart kids and train them, you know, the way it's been done for hundreds of years.


I agree in general. I have a PhD in comp eng and know a lot of academia type subjects, but only a small portion is especially useful in a practical setting. Also a lot of academic knowledge seems needlessly obtuse. It becomes a lot clearer when looking at the practical problems that spurred the academic fields. At times I get the aha as to why this abstract concept is addressing a meaningful question, and if that's how academic education oriented itself, around these basic meaningful questions it would help people learn much more effectively.


> When do you use packrat parser vs LALR vs LL?

This is not something you can expects candidates to know.


Domain-specific compilers are a bigger field, where there's more opportunity to do something interesting

Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a domain-specific program into C

For example, NN-512 (https://NN-512.com) is a compiler that takes as input a simple text description of a convolutional neural net inference graph and produces as output a stand-alone C implementation of that graph

GNU Bison (https://www.gnu.org/software/bison/) is another example, where the output is a parser: Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar

Halide (https://halide-lang.org/) is a language for fast, portable computation on images and tensors, and it originally generated human-readable C++ code as output (but not anymore, now it feeds directly into LLVM)

Can anyone provide other examples? (Code generators for network packet processing?)


> Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a program into C.

Don't you still need many of the techniques taught in this class to build something like that? The material doesn't look inherently targeted at low-level languages or machine-code as a target.


One way to think of it is to split the "compiler" problem into three big pieces. 1) Front-end, 2) back-end, 3) tooling.

Front end is lexing, parsing, building the AST (in whole or just keeping pieces of it around), semantic analysis. When that is done, you can say "Yup, valid program. It is possible to generate code." So, to your question, yes. Those techniques from the course are 100% in play.

Back-end is turning the AST into something you can optimized and generate code from. (Sometimes there is a "middle-end" in there....) High-level optimizations, low level optimizations, memory allocation, register allocation.

Tooling is everything that keeps your life from being miserable as a user -- starting with debug symbols and these days people expect library and package management, etc, etc.

So if you are interested in exploring the Next Great Syntax or some semantic problems that can be re-written into C, then doing a C-front is a great way to do that. Let the C compiler handle all the really-really-really-hard back-end issues for you, and avoid all that distraction. But.... expect that getting debug symbols from your spiffy language into the ready-to-link object file is going to be, as they say, "non-trivial". So... great for experiments and a great learning exercise, but it's hard to do a production compiler that way.


Hi! This course is about the "middle end," FWIW. We do not do parsing or codegen in 6120, and there is no goal (as there is in many undergrad-level compilers courses) of making a complete, end-to-end compiler for a C-like language.


I have just spent the last 13 months significantly reworking a compiler that turns Excel-like formulas into JavaScript. The compiler is used by an app designer, and the formula language is meant to be easy to pick up for people used to spreadsheets. A significant difference compared to Excel is that our compiler features static typing, enabling helpful error messages to be produced at build-time.

(Spreadsheets, on the other hand, tend to try very hard to make sense of types at runtime, using type coercion to turn values of the wrong type into something of the expected type they can work with. That produces surprising results and confuses users. =IF(2, 3, 4) will generally produce 3, because 2 -- which should be a logical/boolean value -- is interpreted as TRUE. We think it's a better idea to warn the user that the first parameter is wrong, and to do that as soon as possible.)

By far the largest challenge with reworking the compiler has been the type system. I have expanded our type system from the four standard spreadsheet types to encompass more than 80, as well as added support for lambdas (for use with functions like SUMIF and FILTER), type inference, parameterized types and union types. The biggest addition is probably an imperative mode, that executes formulas with side-effects in response to events being fired. This is all in service of features we have wanted to add for years, that have been held back by our (previously) simplistic type system.

I did take a compiler class at university, more than 15 years ago, but while that proved very useful when writing the formula parser (an operator-precedence parser, initially straight out of a textbook), type theory was not part of the class. I had to pick up bits and pieces by reading the Java Language Specification (JLS) and by experimenting.

The Cornell class doesn't seem to feature type theory either, which I think is unfortunate. Now that the pendulum has swung, yet again, firmly in the direction of static typing, perhaps static typing will become a bigger part of CS compiler curriculums.


Your project sounds like fun. What/for whom is the project? What level of Excel formula compatibility are you shooting for, e.g. just in the neighborhood like Google Sheets, reasonably close as in Libre Office, or full compatibility with the latest version? If the latter you must have one gnarly test suite. Would love to hear more about it.


OK, I see it's https://www.calcapp.net which looks like a smart product idea. My best to you. Would still love to hear more about the other questions.


Thanks!


The compiler is for Calcapp, an app designer for Excel-savvy users. Our aim is to offer a function library that is easy to pick up for our target audience. That means that we generally follow Microsoft's lead in terms of naming and parameters, but we often offer extensions. Here are a few examples:

* Functions like SUMIF operate on a range or an array and use a textual mini-language to determine which values to operate on. =SUMIF(B1:B2, ">3"), where B1 contains 2 and B2 contains 4, returns 4, because only B2 satisfies the ">3" condition. We're not big fans of that mini-language (which expects you to use the string concatenation operator & to handle dynamic values). We support it, through an overloaded function, but also offer a version which takes a lambda as the second parameter. In Calcapp, SUMIF({ 2, 4 }, Item > 3) and SUMIF({ 2, 4 }, E -> E > 3) (with a named lambda parameter) are equivalent to SUMIF({ 2, 4 }, ">3"). The lambda syntax enables much more complex conditions to be expressed.

* Some functions like SORTBY take a sort order parameter, which is an integer where 1 means sort in ascending order and -1 means sort in descending order. We support functions with integer sort order parameters, but also offer an overloaded version taking an enumerated value instead (SortOrder.ASCENDING or SortOrder.DESCENDING), which we think helps with readability.

* Excel offers logical functions like AND, OR and NOT. We like the C-style operators &&, || and ! better, so we support them in addition to Excel's functions. https://www.calcapp.net/blog/2017/09/16/logical-operators.ht...

* Excel's logical operators (like =, <> and <=) return arrays when applied to arrays and ranges. For instance, { 1, 2 } = { 1, 3 } returns { TRUE, FALSE }. (This is easier to see in recent versions of Excel, where array results "spill," meaning that a return array with two elements spills to occupy two cells of the grid. In earlier versions, as well as Google Sheets, you'll have to use INDEX to tease out the results.) We support = and !=, but also == and !=, where the two latter operators always return scalar (non-array) values. == enables users to determine if two arrays are identical, without having to use AND on the result array (which is the standard Excel trick). Also, our == operator is more traditional in that string comparisons are case-sensitive (unlike =).

* Experienced Excel users like to use double negation to turn logical arrays into number arrays. We have a specialized operator, --, for that, because allowing two unary operators to follow one another would introduce other issues. Here's an example: https://exceljet.net/excel-functions/excel-sumproduct-functi...

* The Excel operators * and + can be used with logical arrays and ranges to signify logical AND, as well as logical OR, respectively. (https://exceljet.net/formula/sum-if-cells-contain-either-x-o...) We have overloaded versions of those operators that take logical arrays for that reason. (Again, Calcapp uses static typing, which makes this a lot more complicated for us than it is for Excel.)

* The latest versions of Excel have great support for dynamically-allocated arrays (along with new array functions like FILTER and SORT). Their new calculation engine supports users providing arrays where, traditionally, non-array parameter values are expected. If arrays are provided for these parameters, then an array is returned. For instance, =SQRT({ 4, 16 }) returns { 2, 4 }. Microsoft internally calls this "lifting" parameters. We refer to these parameters as being "array-capable." In our type system, an array-capable type is a union type (T|Array<T>), with the notable difference that if an array is provided to an array-capable parameter, then the return type is no longer T, but Array<T>.

We do have a test suite for the runtime formula function implementation with a couple of thousand unit tests. Given the number of supported formula functions, though, we should probably have many more.

Most of what I have described here pertains to a new Calcapp version, which hasn't been released yet. In particular, while I am done implementing the compiler, it still needs to be documented (lots of Javadoc), user-facing documentation needs to be written and the runtime system needs to be implemented, so we still have a few (fun!) months to go before everything is ready.

Thanks for taking an interest in what we do.


Looks like you’re doing everything right. That is just a fun project. Thank you so much for sharing.


MATLAB Coder - Generate C and C++ code from MATLAB code https://www.mathworks.com/products/matlab-coder.html

Actually the whole family of embedded code generation products from MathWorks https://www.mathworks.com/solutions/embedded-code-generation...


AnyDSL (https://anydsl.github.io/) might be worth mentioning, it is a framework for creating domain specific languages using partial evaluation to give powerful abstractions to the user while still producing high-performance code for different target platforms. On the website, you can find that they already have DSLs for ray tracing and image stencil codes as well as works for bioinformatics.


Not open source but I'm working on a system that turns a DSL (embedded into Jupyter notebooks) into either c or c with opencl for a niche finance application. Not a huge project (<10k lines of code). Nature of the problem means that DSL can be very simple (it's aimed at non-programmers). This approach also means that could easily add other targets - e.g. cuda or wasm.


Awesome. Link?


Thanks and sorry - not open source and in testing with single client at the moment.


Do you have a structured course for Domain-specific compilers available online?


A fantastic submission, very much appreciated.

I find the presentation style of subtitles with surrounding context extremely helpful. I often lose the ability to listen when concentrating; the subtitle context provides a superior recovery mechanism to the usual, try to work out what was missed. I look forward to purchasing augmented reality glasses at some point in the future which add this advantage to in-person interactions.


Yes!

It’s the complete opposite of, and significantly more user friendly than, how zoom recordings synchronize meeting subtitles with audio/video playback.

In their case they only show the current line in a text message-like sidebar. As the content plays, it snaps the scroll position of the messages to current, hiding the rest. This breaks scrolling ahead or back while listening and you lose any context.

Hopefully I’m just missing a setting - I try to minimize the time I spend in there. Is there a way I can take their damn horse blinders off?


I too appreciate the subtitles and I tend to have them on when I watch+listen to a DVD. You can even click on the sentence and the video jumps to that point. However ... a little proof-reading would help here. To get things started gets transcribed into Ticketing, sir.


I love this line...

“Real 6120 also has an end-of-semester course project—in the self-guided version, your end-of-semester assignment is to change the world through the magic of compilers”


I wish they’d be more courses on compilers that also serve as language servers. The current world is moving towards a phase where a language needs IDE tooling such as compilers, linters, formatters, REPL, language servers, debuggers and profilers.

Would love to see a high level course that covers how to build the tooling for a great developer experience.


As i mentioned in another thread, a uni course that has students work on an existing codebase instead of writing something from scratch might be even better - though harder to grade.

Imagine if one could do a major feature contribution to clangd/pyls/rust-analyzer as part of their advanced compilers course. The student gets a better understanding of (open-source) software development practices, gains a deeper proficiency in the language (not just the APIs), the project brings new contributors and users (including the student herself) benefit from the new feature.

Double bubble!


My uni, NC State, has a course (CSC 326) called Software Engineering that has students contribute to an existing project created by the staff in teams, with the best contributions being merged into master for the next semester students to work on.


Same here in Munich, we've had a course on software-testing in OOP-languages and had to write unit-tests for some modules of jenkins which were graded and eventually merged into master.


That sounds pretty cool. Do you get a choice of what kind of project you want to contribute to?


Sorry - should've been more clear. It's a project written by the staff in Java and Angular. It's been used continuously through the years, evolving each semester with the best contributions being merged. I believe it's a health management service, I haven't taken the course yet.


Anyone interested in following the course together during the holidays? To keep each other motivated etc.


If you have to stop "early" please get through the simple DCE and LVN passes.

A solid basic-block level DCE & LVN optimization along with a judicious "compile time evaluator" (simple constant folding) will let you produce code at around 80% of the level of LLVM.

Obviously, you're not going to get the magic that comes of LLVM's scalar analysis pass, but you're not going to be embarrassed with your performance, either.


Yes! How should we organize this? Maybe start with an IRC channel?


Yes! an IRC channel would be great! Do you want to start it and post the info here? Or else I can start


I remember using IRC chans when doing coursera moocs, it was fun. See ya there


You can start :)


Discord seems faster and more likely to attract a small group nowadays.


Just created one here https://discord.gg/RBPmmdWg


Is this link still valid? It says the invite is invalid.


I'm interested too.


> Anyone interested in following the course together during the holidays? To keep each other motivated etc.

Hey, don't forget my little "advent of code" too:

https://github.com/pfalcon/python-ast-hacking-challenges/iss...

"Convert Python source code to SSA form"


I'm interested. I'm writing a compiler-interpreter now but my broad compiler-construction chops are spotty and going through this systematically might be useful.


I would love to do so! I'm a masters student who has been wishing there was an advanced compilers course in my program, there's only the undergrad level one, so this is perfect! Let me know how I can get in contact.


I’m interested. I may not have the most relevant background, though — I’m a data scientist with an econometrics education. The topic really does interest me, though.


I'm a devops engineer. So not very related either!


I'm game. My background is in math and not CS, so I can't say I'm certain I have the prerequisite knowledge.


I would love to join as well. Loop me in too, if there is IRC channel


Created a discord channel https://discord.gg/RBPmmdWg


Do you have a group for us to join?


Okay, I just created this on discord https://discord.gg/RBPmmdWg. Considering it might be more accessible for people. Wondering if I should post this out of the thread.


On the related note, what would you guys recommend as an introduction to compilers, say for beginners?


Crafting Interpreters has been great. It starts with an AST-walking interpreter (in Java), so you get to focus on parsing and the simplest form of interpretation, and then it moves to a bytecode interpreter (in C) so you get to learn about compiling to a more primitive target via a lower-level language (much of which should translate to a "real" compiler). Very approachable, has lots of side-context and an entertaining style. The language you interpret was custom-designed for the course, which helps a lot, and resembles a simplified JavaScript without all the messy parts.


Author here. Also you can read the whole book online for free: http://craftinginterpreters.com/


Engineering a compiler is by far the best introductory book that is also modern.

Advanced compiling for high performance is one of not many good accounts of scheduling for non-toy architectures.

Muchnick's book is a good encyclopedia of optimisations although the code is unreadable and buggy.


cs4410[1] is really really good, and cse131[2] is basically the same thing but with videos. I highly recommend them both.

[1]: https://course.ccs.neu.edu/cs4410/

[2]: https://ucsd-cse131-f19.github.io/


Compilers books: I'd start with "Engineering a Compiler" by Keith Cooper and Linda Torczon. http://craftinginterpreters.com/ is also a pretty great, programming-oriented intro, which may be good to work through alongside. For more on the analysis & compiler optimization side, "SSA-based Compiler Design" (http://ssabook.gforge.inria.fr/latest/; GitHub Mirror: https://github.com/pfalcon/ssabook) is a good follow-up.

Further readings: Book recommendations in https://github.com/MattPD/cpplinks/blob/master/compilers.md#... as well as program analysis resources (in particular lattice theory, type systems and programming languages theory, related notation): https://gist.github.com/MattPD/00573ee14bf85ccac6bed3c0678dd...

Courses: I can recommend the following: https://github.com/MattPD/cpplinks/blob/master/compilers.md#...

Particularly (in alphabetical order--I think these are all great, so including highlights of what I've liked about them):

- IU P423/P523: Compilers (Programming Language Implementation) - Jeremy Siek, with the course book "Essentials of Compilation: An Incremental Approach" (pretty interesting approach, with programming language features developed incrementally having a fully working compiler at each step, cf. http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf; implementation language Racket),

- KAIST CS420: Compiler Design - Jeehoon Kang (good modern treatment of SSA representation itself, including the use of block arguments, https://mlir.llvm.org/docs/Rationale/Rationale/#block-argume..., as well as SSA-based analysis and optimization; Rust as an implementation language),

- UCSD CSE 131: Compiler Construction - Joseph Gibbs Politz, Ranjit Jhala (great lecturers, both Haskell and OCaml edition were interesting; fun extra: one of the Fall 2019 lectures (11/26) has an interesting discussion of the trade-offs between traditional OOP and FP compiler implementation),

- UCSD CSE 231: Advanced Compiler Design - Sorin Lerner (after UCSD CSE 131: for more on analysis & optimization--data flow analysis, lattice theory, SSA, optimization; fun extra: the final Winter 2018 lecture highlighted one of my favorite papers, https://pldi15.sigplan.org/details/pldi2015-papers/31/Provab...),

- UW CSE CSEP 501: Compilers - Hal Perkins (nice balanced introduction, including x86-64 assembly code generation, with the aforementioned "Engineering a Compiler" used as the course textbook).


Thank you so much for this comment. It's incredibly difficult to find modern compiler learning resources – when I ask, I often get disparaging comments about the dragon book and not much more.

Please, consider putting this list on your blog, or somewhere more visible than a HN comment.


> disparaging comments about the dragon book

How's that possible? It's a "keep it under your pillow" book. Whether you read it or not is a different question. I don't know anyone in the 'hood who wouldn't have.


I've been reading "Modern Compiler Design" currently, not too sure how beginner friendly it is though.

Also, haven't made it too far into the book so can't really give a recommendation either way.


Stanford Compilers Course


Even if you don't do the course, the first paper they list is a fascinating must-read:

Producing Wrong Data Without Doing Anything Obviously Wrong! Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F. Sweeney. ASPLOS 2009.

https://dl.acm.org/doi/10.1145/2528521.1508275

I happened to be at that ASPLOS and I thought it was fascinating. Good to see it recognized as one of the most important papers in compilers.


Has anyone taken Compilers course from David Beazley: https://www.dabeaz.com/compiler.html. I'd love to hear your feedback, please


I took the undergraduate version of this course while at Cornell. It was a wonderful course. I taught me a lot how computers work, and how to organize a large (for me at the time) software project.

One of the most painful parts of the the project was using Scala which had ridiculously long compile times which made iterating quite a pain.

My favorite memory was that for one project we had to submit a program written meant to stress test everyone's parser.


The course materials look good.

But what is a "PhD level course"? All I've seen is undergrad and graduate, and the difference between masters and PhD has nothing to do with classes.


Cornell student here that's taken a bunch of classes at all of the levels - in my experience, undergraduate classes/master's classes are kind of taught with the expectation that you're taking the course as a requirement and are much more evaluation heavy, and that you're likely taking a full course load of other courses as well. PhD level courses are generally assumed to be the only course that you're taking that semester and that you're taking it more because the material will be useful for your research rather than for the credential, so there's much less emphasis on evaluation; i.e. a smaller quantity homeworks that each much harder than an undergraduate version of the same class, and final evaluations are much closer to reading recent research papers and understanding them rather than a timed exam.


> the difference between masters and PhD has nothing to do with classes.

huh ? what's your education system ? in europe it's generally 3 year bachelor, then 2 year masters, then 3 years phd so when you have classes during your phd hopefully they are different that the one you get during your masters


Depends on the field. I studied philosophy (Ph.D. program, but finished with an M.A.), and the M.A. is often just a degree given after the first few years of the Ph.D. There are a handful of required first year courses (logic, basic metaphysics, philosophy of science and ethics), but otherwise students take whatever courses are relevant to their interests. It helps that philosophy typically has fewer classes with explicit prerequisites. If you take an advanced course, you might be lost, but it doesn't feel the same as not being knowing a formula or how to derive something.

Some courses are more basic, and some are more like collaborative research seminars. Advanced students gravitate towards the latter (and typically take fewer courses and/or audit more of them), but a first year with the right background can still take them. Conversely, an advanced student might take an introductory course in an area they lack background in to broaden their knowledge.

There also are terminal M.A. programs, but they're less common, and the majority of students who do Ph.Ds at prestigious institutions don't do one.


I am in the US, and at my school masters and PhD students all take the same classes.

It is not ideal because many of the masters students are coming from a non-CS background. Some of the graduate level CS courses I have taken were easier than 4th year level undergrad classes.


This tracks well with my experience also. I was disappointed, for the most part, the several graduate CS courses for this reason.


By the time I'd finished graduate classes, I had a hard time with proofs. After the 10th or 12th introduction to predicate logic with slightly different notation, it just all becomes a big, confusing mass.


I honestly have never figured out European education systems. Sorry!

But this is Cornell, which I assume is like other research-oriented US universities. You have a 4 (or maybe 5) year undergraduate, leading to a Bachelors and then apply to graduate programs. The graduate schools may have specific Masters tracks, but not really any general Masters program. Instead, you spend the first year or two taking graduate classes until you either satisfy the assorted requirements or pass an oral qualification (They still have those, right?), after which you focus entirely on research, publications, and your dissertation defense. If you leave school before defending, you get the Masters as a sort of consolation prize---most US PhDs in my experience don't have Masters.


In the US, it's typically a 4 year undergrad, 2 year masters, and a ~5-7 year PhD depending on the program itself. From what I understand, while there are some classes you take as a PhD student, a majority of your work is doing research


For PhD students, the masters is integrated into the 5 years they spend doing their PhD. They are expected to take courses as a "pre-candidate" and do primarily research after they pass their qualifying exams and become candidates.


This is usually true but not always. I attended an economics program with a terminal master’s offering, and we spent a year and a half as master’s students before moving on to PhD work.


Just for completeness sake. In the EU a bachelor's is 3 years, a master 1 or 2 (2 being the standard) and a PhD 3-5 years. It depends on the country, so it's possible to get 3+2+5.


There is no formal difference between masters and PhD courses. I think he calls it a PhD level course because it is "research oriented" and requires work that perhaps a Masters student looking to complete credits might not be interested in.

The evaluation criteria is also fairly subjective and rough, which might not work for a larger masters level course.


Well, I don't feel like cleaning the basement or finishing my yard work before the snow, so I might just do this course over the holiday instead.


Any idea if I can jump right into this after my only experience being Crafting Interpreters?


Of course. Why not?


They should renumber it CS 6502.


Funny you mention 6502, because as wonderful as its assembly is, its not that friendly for higher level languages. I tried writing a basic C like language for it with 16 pointer support (see LDA ($ZP),Y), and came to the haunting realization that function calls require pushing the de-reference of zero page pointers to the stack which change their de-referencing address space from 0-255 (eg. main) to 256-512 for subsequent stack frames.

I then realized that at a speed of 1MHz, that the pointer de-referencing, the stack frame magic, and endless copies to just parse basic expressions just made 6502 a pointless target for a C like language.

It was fun though.


This course is asynchronously delivered.

Question for students: how do you find synchronous courses with some discussions during class, vs asynchronous courses where discussions are limited to an online written forum or office hours?


Asynchronous discussions are durable as content (in the internet). They are also very useful for working students. But they need focus to retain them in memory.


Not a problem at all. For non-introductory courses, this works well.

But needs to have at least some sort of synchronous at some point. In my one class, we use Notion for discussing so it has a live sense of discussion.


Listening to a Russian theoretical physicist mumble about how trivial various results are over a zoom call really makes my day, put it that way.i


I'm interested by the topic but I'm a junior backend/data eng, do you think it could be helpful for my profile to dig into the concepts of compilers other than for the culture itself?


What would be some cool use cases to design your own compiler?


There are various weird and wonderful niche languages out there, if you appreciate programming languages as an end in themselves. Two examples: Sentient [0] and Factor [1][2].

[0] https://sentient-lang.org/

[1] https://factorcode.org/

[2] Factor's home-page does a poor job of presenting examples to give a flavour for the language, so here's one on GitHub: https://github.com/factor/factor/blob/master/basis/roman/rom...


Sentient looks neat! Does it use the same strategies as Prolog under the hood?


Good question, I'm afraid I don't know enough about either language to answer that.

The Sentient pages mention Prolog, for what that's worth: https://sentient-lang.org/intro/declarative


> The MiniSat solver is Sentient’s default. It is a version of MiniSat that has been compiled to JavaScript with Emscripten. This means that it can run in a web browser and it does not have any dependencies when run from the command-line.


The simplest case where they prove useful is usually when your regex is longest than 20 characters. Write a parser once, and it'll be readable to people other than yourself (if it isn't your schema is bad)


Seems a pity to throw out the various advantages of the high-level solution just because your regex engine doesn't support a scalable syntax.

Advanced regex systems such as the Ragel parser-generator support a composition-friendly syntax, [0] but the average regex system doesn't, and I guess there's no quick fix for that.

[0] http://thingsaaronmade.com/blog/a-simple-intro-to-writing-a-...


A (not)compiler can be made to look for issues in your codebase that already exists. It would need all the tools of a compiler, just that the target is the same as the source language.


Nice, although I'd have wanted a segment on compiler testing.


Indeed; I previously had these papers on the list but had to take them out for time this semester:

- Finding and understanding bugs in C compilers. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. PLDI 2011. https://dl.acm.org/citation.cfm?id=1993532 - Compiler validation via equivalence modulo inputs. Vu Le, Mehrdad Afshari, and Zhendong Su. PLDI 2014. https://dl.acm.org/citation.cfm?id=2594334


There are just too many fun things to teach.

More recent paper: yarpgen

https://github.com/intel/yarpgen/blob/main/papers/yarpgen-oo...


Oh, compiler construction :) I really enjoyed that lecture during my M.Sc.

"parallelization, just-in-time compilation, and garbage collection" sound fun, too. Especially JIT.


It's great that there's a lecture on dynamic compilation - so often overlooked.


This is amazing. Thank you for sharing.


No code generation?


It looks like its more about generating IR. The professor wrote his own LLVM like tool called Brill, and use of the tool seems to be the focus of the course.


What a treat. Thanks for this submission.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: