I’ve worked on multiple compilers (optimizations expert) at MSFT on VS and CUDA and gave developed a DSL and worked on Database Compilers.
I can’t hire compiler people with right skills.
We’re building an IDE and those parsers are written differently, and we use Scala packrat parser combinators.
These courses teach very little judgement or industry relevant stuff. When do you use packrat parser vs LALR vs LL? Good luck hiring an engineer with advanced compiler courses knowing any of this.
I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
I don't teach compiler courses, but I'm an academic, and I do keep in touch with industry people to find out what's important for my data analysis course. A couple of big problems with your proposal:
1. Time variation in "what industry wants". This year one thing's hot, the next year it's something else. A bare minimum requirement for any course is that a student taking it in their second year should learn something that's still relevant three years later when they're in the first year of their job.
2. Cross sectional variation. There's no such thing as "what industry wants". Every place wants something different, with only a small subset common to 80% of employers.
My two cents - what’s useful for employers is giving students the confidence to jump into the code and figure it out. By this measure, what’s relevant for a compilers course isn’t the material itself. It’s that it’s structured to force the students to figure certain things for themselves.
There are college hires who need to be spoon fed, and college hires who just need mentorship and direction. Group two are invaluable.
Andy Pavlo of the CMU DB fame echoed this in one of the intro video lectures to one of his courses. He said his contacts in the industry valued people, who can navigate a large codebase productively. Most uni courses seem to involve writing a "toy" solution from scratch, which is seldom the case in professional programming.
He designed the course to be an exercise in working in an existing codebase that one needs to understand before adding features.
I guess that is one of the many reasons CMU grads kick open the door into many companies.
I am currently in this class and can confirm that navigating the codebase is a nightmare. The specifications we are given in the relevant assignments are vague or wrong and the code itself is pretty questionable. I spend more time trying to understand the assignment and current code than actually writing anything myself. If that's what he's going for, I guess he succeeded.
Sounds like you got yourself a high quality code base!
It's normal for a code base to be over a decade old, during which time it was worked on by a team of a dozen positions, each of which turned over on average every 3 years. So now you've had a few dozen people, each with their own ideas of what's a clear way to express a problem, trying to solve evolving business requirements under tight deadlines. And there's a 50/50 chance the project originally started as a 'throwaway' proof of concept.
> Most uni courses seem to involve writing a "toy" solution from scratch, which is seldom the case in professional programming.
Without writing a "toy" from scratch, one would never understand professional programming or codebases either. They wouldn't understand why they oftentimes poorly designed, full of bugs, etc. Tracing own "from scratch" steps, with self-reflection, is the best way to be prepared to "professional" programming. First, make and find mistakes yourself, only then you will be ready to find mistakes in, and improve on, other people's stuff.
>>Without writing a "toy" from scratch, one would never understand professional programming or codebases either. They wouldn't understand why they oftentimes poorly designed, full of bugs, etc. Tracing own "from scratch" steps, with self-reflection, is the best way to be prepared to "professional" programming.
I disagree. All the toy from scratch provides is a way to get the student to go from zero to poorly designed solution during the course, which he can just forget it ever existed as soon as the grade is in.
Meanwhile, you have no experience onboarding to an unfamiliar codebase, have to find a bug and fix it, implement a feature, or refactor it to reduce technical debt and improve it according to some criteria. And that's pretty much at least 3/4ths of a developer's work.
ibains made a specific point about language processing in IDEs being different to that of (traditional) compilers. Presumably there exists a state-of-the-art there just as there's a state-of-the-art for conventional compilers, but academia doesn't give it as much attention.
2. Cross sectional variation. There's no such thing as "what industry wants". Every place wants something different, with only a small subset common to 80% of employers.
I don't think that really holds up. Ultimately you could say this about any small mismatch between desired skill-sets and those available, but at some point we call it hair-splitting.
If you need someone to build a parser for your IDE, you're presumably better off with an expert in traditional compilers, than with someone who knows nothing at all of language processing. You'd be better off still with someone experienced in language processing within IDEs. Less mismatch is better, even if it's impractical to insist on no mismatch at all.
Your IDE parser will be unusable if it goes bananas while you're typing the characters needed to get from one fully, correctly parseable state to the next.
It needs to be able to handle:
printf("hello");
and also:
prin
and also:
printf("He
It also needs to be able to autocomplete function signatures that exist below the current line being edited, so the parser can't simply bail out as soon as it reaches the first incomplete or incorrect line.
> Good error recovery / line blaming is still an active field of development.
True. But let's get terminology straight: that's not a compiler science, that's parsing science. And it's no more compiler science than parsing a natural language is.
What terminology are you talking about? Neither "compiler science" nor "parsing science" are terms I used, or that the industry or academia use.
Parsing - formal theory like taxonomies of grammars, and practical concerns like speed and error recovery - remain a core part of compiler design both inside and outside of academia.
How can you be sure that that } is the end of a certain defined block? This most importantly affects the scoping and in many cases it's ambiguous. IDEs do have rich metadata besides from the source code but then parsers should be aware of them.
Maybe my wording is not accurate, imagine the following (not necessarily idiomatic) C code:
int main() {
int x;
{
int x[];
// <-- caret here
x += 42;
}
This code doesn't compile, so the IDE tries to produce a partial AST. A naive approach will result in the first } matching with the second {, so `x += 42;` will cause a type error. But as noticable from the indentation, it is more believable that there was or will be } matching with the second { at the caret position and `x += 42;` refers to the outer scope.
Yes, of course parsers can account for the indentation in this case. But more generally this kind of parsing is sensitive to a series of edit sequences, not just the current code. This makes incremental parsing a much different problem from ordinary parsing, and also is likely why ibains and folks use packrat parsing (which can be easily made incremental).
Partial parse state and recovery are critical. You don't want the entire bottom half of a file to lose semantic analysis while the programmer figures out what to put before a closing ).
Packrat parsers are notably faster than recursive descent parsers (also critical for IDE use) and by turning them "inside out" (replacing their memoization with dynamic programming) you get a pika parser which has very good recovery ability.
There are varying techniques to improve error recovery for all forms of parsing but hacked-up recursive descent (certainly the most common kind of parser I still write for my hacked-up DSLs!) have poor error recovery unless you put in the work. Most LR parsers are also awful by default.
When I was in university most focus was on LL and LR parsers with no discussion of error recovery and more focus on memory usage/bounds than speed. I also have no idea how common it is to teach combinator-based parser grammers these days; stuff like ANTLR and yacc dominated during my studies. This would add another level of unfamiliarity for students going to work on a "real compiler".
> Packrat parsers are notably faster than recursive descent parsers
I think this needs to be qualified. I don't think a packrat parser is going to beat a deterministic top-down parser. Maybe the packrat parser will win if the recursive descent parser is backtracking.
True - the performance of a top-down parser is going to depend on if/how often it backtracks and how much lookahead it requires. But this requires control over your grammar which you might not have i.e. you are parsing a standardized programming language. Practically speaking, unless you want two separate parsing engines in your IDE, that Lisps are actually LL(1) and Python and Java are "almost LL(1)" doesn't get you closer to parsing C++.
Packrat parsers are equivalent to arbitrary lookahead in either LL or LR grammars.
Tree sitter is based on LR parsing (see 23:30 in above video) extended to GLR parsing (see 38:30).
I've had enough of fools on HN posting unverified crap to make themselves feel cool and knowledgeable (and don't kid yourself that you helped me find the right answer by posting the wrong one). Time I withdrew. Goodbye HN.
There is no such variance and new fashions in compilers every year. There are hardware compilers and tools including language servers primarily. Move to cloud requires source to source compilers for data snd transforms.
> There is no such variance and new fashions in compilers every year.
Indeed. Packrat is exactly an example of passing fad in parsing. I find it telling that you, an "optimizations expert" are so worried about ways to parse. Shows that "new fashions every year" have reached the compiler industry too.
Hiring for exact skills is what you do when you need the skills ASAP and the actual supply of engineers is there. The time you waste looking for someone with the exact knowledge you need could have been spent getting a good compiler engineer and just teaching them what you know. And magically that good engineer will get better, it's pretty amazing stuff.
I've was told multiple times, University/School teaches you how to learn. You get your actual knowledge and skills from the job. What I learned in 3 years of university was nothing compared to what I learned in 6 months on the job.
At first, it sounds like "win/win" if universities trained their students to use the same tools companies want. Companies would save money for training, and graduates would be ready for their jobs.
But in longer term, it is actually "win/lose", because if three years later the fashionable tools change, the companies that optimize for saving money on training would simply fire their existing employees and hire new graduates.
For the students, it is better to be the kind of person who understands the deep principles and can learn new tools as needed.
And the companies have a choice to either offer job trainings, or offer higher salaries to people currently working in other companies who already have the needed skills. (Or whine about market shortage and import H1B servants.)
I will second this. I would much rather hire an engineer who has a related background and wants to learn over an engineer with the exact skillset but does not want to learn. It may take a few months, but the former engineer will pick up what you want them to know (if you mentor them correctly), and past that inversion point, they are immensely more valuable.
You are thinking about it the wrong way. In this niche, you will almost never find people with the exact skills you are looking for. Better to find people with tangentially related skills and experience and let them grow into the role. Academia is never going to be comprehensive enough to prepare students for every niche that has some demand. Rather, they should just focus on teaching a core that will help them learn what they need later on demand.
> When do you use packrat parser vs LALR vs LL?
If you need ok performance and good error recovery (especially if you are doing stuff in an IDE!), none of the above: recursive descent is not that hard to implement, and is incredibly flexible.
I agree. Engineers are under educated. This is not going to improve.
Poor and middle class kids don't have the luxury to soak up as much education as they feel like before choosing a topic they are interested in. That is a privilege reserved for the wealthy.
We go to schools that are pragmatic, teach us some basics, and the get us out the door to do cog work.
Look at all of the comments you have drawn. Let me spell it out for you: your expectations are too high.
Society is not designed to create compiler engineers you can hire fresh out of school. It is churning out bright scrappy people who are hungry for class mobility, a decent standard of living, and an interesting job to smash their brain against.
Maybe look for one of those instead of expecting a perfect candidate to drop out if the sky. You have a cool as hell project so I'm sure you could find a brilliant person willing to learn.
The problem is some engineers are ideological, you don’t want to work with those. If you have this argument during an interview and the interviewing engineer is insisted that X is always the right choice, run away.
Anytime a commercial language gets bogged down in parsing, unless that is core to the product (and mostly it isn’t), also run away. Parsing represents like 1% of the work that needs to be done on a compiler, even if it’s an IDE parser that needs to be really incremental (the only worse thing is to get stuck on lexing).
Do you think it is the responsibility of academia to teach "industry relevant stuff"? I agree that courses generally don't teach judgement well, but I do think that it is the workplace/industry's responsbility to train their engineers and not expect fresh grads to be fully equipped to work on something that specific.
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
I don't think that was what ibains was going for, though i don't fault you for seeing this in that comment. (Especially that i don't think this course suffers from that problem and generally i think things are rapidly improving on this front.)
That problem to me and i think to him too is that quite a lot of things that such courses tend to teach as well thought out, well working solutions and approaches just don't work very well and frequently i find comments on what's a 'good taste' solution and what isn't that are completely my understanding of the problem space which is why.
E.g. in parsing (as that's the topic he mentioned): First, lots of people just spend way too much time on it. And they focus on parts that are of zero use to beginners (like explaining all several grammar families) and then use obtuse parser generators that save no work and sometimes use them in bizarre way (like picking a lalr parser generator then hand editing the output to support a not-quite-lalr language). Meanwhile a recursive descent parser is easy to write, fast and gives pretty good error messages with _very_ little work. You do need to know enough about grammars to be able to write one down and know if it describes an ambiguous language etc so this should be taught, but you don't need to understand the language zoo well.
You might believe academia should not impart knowledge students require for their subsequent jobs. Ask the students why they go to university.
See what is taught in UCBerkeley with Spark and all coming out of there. I took a systems course with Eric Brewer - totally amazing - the context I got.
I think the ideal is really that a student should be capable of going into the world and picking up a new technology within the same area without too much trouble — I may have used ANTLR for a class, but I should also understand it well enough to write my own parser, or start looking into other parsers.
But if the gap is such that knowing ANTLR doesn’t really help me use packrat parsers (haven’t used them myself, so I don’t know) then it’s a gap probably worth filling.
But ideally students should learn the fundamental architecture/algorithms, not the specific tools, popular algorithms & implementations, and all the accidental complexity that came with it.
Which is where the industry vs academia problem usually comes in — industry often only cares about knowledge of a particular tool, and only sometimes cares about knowledge-transferability (I think largely because HR exists, and operates independently of the actual engineer making the request, with the first pass review).
The company hires for Hadoop, the university teaches distributed systems, and ideally the student can be trained in Hadoop specific in short order.
India also offers an example of the extreme alternatives — heavily industry driven courses focusing on specific tools, and you end up with students having practical knowledge for current tooling... but no ability to transfer it. Like the kind of resource who refuses to program C# because they’re a Java developer, and it’s all they know
Strictly speaking, D is not context free nor is any statically typed programming language. That said most languages, including D, have supersets that are context free and can be used to build an AST which can then be refined further with context sensitive checks (ie. variables are used after declaration) and then even further with type checks.
Many language tools don't need a full blown language parser and can get by with a context-free parser for the language's superset. Things like autocompletion, spell checking, documentation/highlighting can all be implemented using context-free parsers.
I suppose we could argue about the definition of context free, but the compiler lexes it without any reference to the parser, and the parser builds the AST without any reference to symbols or semantic meaning.
> spell checking ... using context-free parsers
Only up to a point. D's spell checker uses the symbols in scope as its "dictionary". This means it is very good at guessing what you meant instead of what is typed, as the dictionary is very targeted. I.e. it uses the context, and the spell checker is part of the semantic pass.
A context free grammar has a formal and rigorous definition. One way to see that Dlang does not have a strictly context free grammar for its AST is the following snippet of code:
A[B] C;
That is parsed differently depending on whether B is a constant integer expression in which case it parses to the declaration of an array of type A with a size of B, or whether B is a typename in which case it parses into an hash map with keys of type B and values of type A.
This disambiguation can not be specified by the context free grammar and instead must be delayed until the semantic analysis (which is usually defined by a context sensitive grammar).
Pretty much all statically typed languages work this way with various kinds of ambiguities. In practice, it's not a particularly big deal. But that's kind of my point, there's no need to work exceedingly hard to force a language construct to be context free, especially if making it context free results in a lack of readability or awkward syntax.
It's perfectly fine to specify a superset of the language that is context free, parse an AST for the superset of that language and then resolve AST ambiguities in later phases, such as the semantic analysis phase or the type checking phase. Almost any tool written to assist an IDE with autocompletion or other functionality will have no problem working with the language superset.
You realize the poster you are responding to is the creator of D? And has spend his career writing compilers? I'm not sure he needs to have someone explain what a context free grammar is.
I understand you wish to flatter WalterBright, but please note that your comment is fairly toxic and not well suited to Hacker News. Please stay on topic and feel free to contribute to the points being made. This kind of hero worship often just degrades discussion and is better left to other forms of social media.
Yours is the toxic comment. Appeal to authority is fine if the authority is an authority and WB has proven his. Suggestions of intention to flatter and accusations of hero worship are pretty gross. @kingaillas's point is entirely valid IMO.
> Most languages are context sensitive. Most language tools are context free. How did we go so wrong?
We never did. We always knew that any real-world useful grammar is multi-level and attribute (well, it's constraint-based, though depending on your definition of "attribute grammar", those are equivalent). That's why we so much like recursive-descent parsers: adding any multi-level constraints are so easy to them - you have a full Turing-complete language at your disposal, unlike simplistic 1-dimensional DSLs of most parser generators.
Context-free grammars are ripe for theoretical computer science work even if they're not practically relevant. On the flipside, I suppose the constraints of context-free grammars are seen as a price not worth paying when designing a language.
Languages that are context-sensitive? C, C++, Java, JavaScript.
Examples of tools based on the starting expectation that languages are context-free? Yacc, Bison, Jay.
Examples of the problems this causes? Well we're using the wrong tool for the job, right from the start. Instead of using an appropriate tool the first thing we do is bend our tools out of shape. We use side-effect actions to subvert the tool's model in an uncontrolled way. We don't get the full benefits of the tool's original model and can't rely on its guarantees.
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
In college I learned a lot of math and how to do things like calculate stress and bending. I learned nothing about material finishes, material selection, fastener selection, tolerances, drafting, etc. But the latter was easy to pick up on the job, while learning mathematical analysis on the job just doesn't happen.
And learning PHP, React, and MongoDB is easy to do on the job.
However, I think that there are industry practices which may inform what students are taught.
For example, process algebra is rarely (if ever?) used in industry. There are other formalisms of comparable difficulty and academic interest which are used however, perhaps teach those instead?
I definitely think there ought to be courses on such things available. There should be an educational route that is between pure academia which focusses on esoterica that is only relevant to research and vocational courses that don't include much innthe way of theory at all. The idea that industry ought to pay for it presumably comes from a background where universities are private an expensive. But I'd like to live in a world where these kind of courses were government subsidised and available for free or cheap.
> The idea that industry ought to pay for it presumably comes from a background where universities are private an expensive.
In the US at least, where even public universities are ridiculously expensive, there's no way to keep the current system running unless our grads are able to get jobs that pay well when they graduate. Indirectly industry is paying for the courses taught by universities. In the current environment we need to prepare students for employment.
> I'd like to live in a world where these kind of courses were government subsidised and available for free or cheap.
I think we'd be better off as a country (US). I also think that ship has sailed and it sank to the bottom of ocean a couple decades ago.
I tend to think of most university education as taxpayer and student subsidized versions of the sort of training that companies should be doing. I think, for most programmers, a well-thought-out apprenticeship program would be better both for them and for the company, except that the amount of money we send to universities makes this economically infeasible.
> Do you think it is the responsibility of academia to teach "industry relevant stuff"?
I mean, there are colleges that cater specifically to industry. Digipen is a good example of this: they have very close relationships with game studios and shoehorn their curriculum into the requirements the state imposes on colleges.
These sorts of colleges are not particularly popular, however, so I think most prospective students understand the value of a more broad and timeless approach to software education.
I teach compilers at a big university.
And I would love to hire graduates with compiler skills for my
startup, but find it difficult.
There are several structural problems that conspire to keep
students from acquiring knowledge in low-level programming domain like
compilation: a course needs to fit with the rest of the curriculum,
and with student interest. I cannot get students interested in lexing
and parsing, because they hear all day how exciting deep learning and
big data are. Lexing and parsing are deemed solved in the 1960s, a
sentiment I disagree with. In addition, classical theoretical
computer science (e.g. automata and languages) is rarely taught with
enough depths in many universities, so students lack the technical
background. You can cover this in a compilers course (I do) but it
takes time, and the students see it as an ordeal. Compilers as a
subject is not helped by the most widely recommended book being the Dragon Book which is dated and way
too long to be useful for beginners. Compare the Dragon Book with
the material available as introduction to machine learning ... Many of the existing textbooks are also not covering modern compilation themes, in particular JITs, and compilation for multi-core and GPUs. I'd say there
is currently no good undergraduate textbook on compilers. I could probably, with a lot of work, cobble something reasonable together from my lecture notes, but I don't believe in books, I'd like to do a good MOOC, but building suitable software infrastructure requires more work than I am currently willing to put in.
My department tried hard to remove the compilers course, as "no
longer relevant", and "too hard". I threatened to resign if it was not
kept as a mandatory course. For now this worked.
I never took a compilers class during university study and it is my greatest regret. I've since been self learning and it hasn't been easy.
Thanks for advocating for the importance of compilers. This lack of knowledge has been a limiter for me many times. I'd really like to fill a skill gap and be able to write a DSL or a little language to solve a problem.
From my experience in industry, while hardly any programmer will ever need to write a modern code generator, few things are more useful in practise than being able quickly to design a suitable DSL and implement a corresponding performant and correct lexer and parser. This is not something that can easily be picked up on the side.
In addition, understanding what happens under-the-hood when you program in a high-level language is an eye-opening experience for many students. This sentiment comes up in student evaluations year-in, year-out.
I never understood this. I took the compilers class at CMU, and I loved it. I went on to take a follow-up class implementing a compiler with higher-order types. Meanwhile, most of my classmates avoided compilers altogether.
I've worked in the industry since undergrad, and I've implemented several interpreters for DSLs. Sometimes I was asked to make it, but other times I had the flexibility to dream up a DSL on my own, and it ended up becoming indispensable to users. I've always loved building them, but to this day, parsing is still the most boring part to me.
I recently went through Bob Nystrom's Crafting Interpreters, and it was great fun. I implemented an interpreter in Rust. Material about compiling for GPUs would be interesting. But for me, personally, I really wish there were something updated for how IDEs use languages nowadays, in particular, walking through a good way to implement the language server protocol.
The calculator language that every compiler book starts with should be LSP-first.
A MOOC would be great, but I'd also be happy with a series of blog posts.
Does anyone know of good material bridging the gap between traditional compiler architecture and the LSP?
> Compilers as a subject is not helped by the most widely recommended book being the Dragon Book
Is it? Even I, an outsider [to the modern university courses], see that the baseline quite shifted to the TigerBook (aka Appel - Modern Compiler Implementation in [your choice]).
Is it at all possible that traditionally compilation is taught back to front? I am just a hobbyist, my CS degree did not include anything about compilation other than the theory of types of Grammar. I am the first to put my hand up and say I do not really know what I am talking about.
For me, compilers are interesting because of
a. optimisations; at my level simple logic that is obviously advantageous to a CFG.
b. interaction with the dependencies and operation of low level hardware; how does it REALLY work.
In the hobbyist pursuit of an understanding I have become belligerent to parsing. Parsing is interesting. But also not really. I would have benefited substantially from an abstract introduction via graphs, analysis and lowering. Both in terms of interest and fundamental understanding.
Thanks. I'm actually very interested in lexing and parsing, because that's probably 99% of the stuff a layman-programmer gets to do with compiler theory in job. I mean not everyone gets chance to write the backend part, but parsing skill is almost used universally.
A fun exercise is writing a lexer generator or a parser generator. It's probably easier starting out with writing a lexer generator like Flex for your favourite language. I recommend using the Flex input format, than you can test your creation using Flex as a testing oracle! This is not something you can do in a weekend.
No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.
As this is a PhD course, I'd expect the goal of it to be preparing students for research in the field, not writing parsers for some industrial company.
Also, there are definitely courses that teach you how to write a parser. But we usually don't call them advanced.
> No offence, but parsing (in a compiler, not in general) is probably the most boring part of a compiler.
Not if you're writing an IDE. Writing a batch mode parser that takes an entire correct source file and produces an AST is easy. Writing a parser that instaneously handle producing an updating AST while the user is typing in the middle of the program while still allowing semantic analysis to occur and without vomiting out a pile of extraneous errors or giving up entirely is a lot harder.
Phew, how boring! I know compiler people who (apparently) spend their time at dayjobs writing IDEs, then get back home and write books on ol' good "batch mode" processing. Can't wait for them to add a JIT chapter. But imagining they would work instead on "writing IDE" chapter would be rather funny ;-).
You could consider this an observation that writing a batch mode parser is so much easier that I'm comfortable writing a book about it, but I still don't know enough to write about IDE-grade parsers and analyzers.
This thread is so weird. The topic at hand is a compliers course. ibains complains about a desire for qualified IDE developers. It's like complaining that your family doctor isn't an orthopedic surgeon. Why should a compilers course spend the requisite time for these highly specialized IDE algorithms?
Specifically. What part of this course should be removed to make room for IDE-specific algorithms? Or, would an advanced IDE algorithms course (with this course as a prerequisite) be a more sensible approach?
I saw this great interview recently with Anders Hejlsberg at MSFT on how modern compiler construction differs from what is taught in traditional University courses. Is this what you're alluding to?
After watching that interview, it's strange to read a comment like yours. While the architecture is completely different, it doesn't frankly seem like that big a leap to go from "senior engineer with X years working on thing that's tangentially related to compilers" to being able to be productive in a reasonable amount of time working with the new architecture. What am I missing?
I will say that I asked Shriram Krishnamurti about why books on interpreters and compilers tend to avoid the question of parsing like the plague and found his response a little unsatisfactory.
All these celebrated texts like SICP, EOPL, and Lisp in Small Pieces avoid the question of parsing altogether by focusing on s-expression based languages.
His response was effectively that parsing is over-emphasized in other books, and it's truly orthogonal to other PL activities. Which I can understand, but in practice when I need a parser in my day job I usually don't have much say in the structure of the input "language" (since I'm usually parsing something coming from another source that I have little control over). And it would seem if you have an instructor in a compilers course with this point of view, what other class would give you an opportunity to learn parsing in a formal setting?
A full half of the compilers course I took in undergrad was about {LA,S,}LR; I think "parsing is over-emphasized" is alive and well... That said, it didn't cover non-table-based parsing much. (On the other hand, I'm unaware of a non-table-based parsing algorithm that can statically warn you about problems with the grammar (ambiguities etc).)
Hey, I know Shriram! We were students at Rice together. Not saying we knew each other well, more like passing acquaintances, but I recognize his name.
Anyway, as I recall from compilers classes (it has been ~30 years), the Dragon book heavily emphasized parsing, but the professor said "I'll assign you a parse to do by hand for homework, but and that's it".
The overall attitude was a combo of 1) parsing theory is also covered in another theory of computation class (where the classic textbook was at the time "Introduction to Automata Theory, Languages and Computation" by Hopcroft and Ullman, but these days may also be Sipser's book; 2) there were so many other topics to cover in compilers that a week or so of parsing was all the time budget allowed; and 3) the research focus of the professor's work was in other areas, and compiler research in general was the same.
So not enough attention to parsing might be a side effect (perhaps unfortunate). Even if there is enough material to have an entire class on parsing.
I took the advanced course in compilers too, which was more like a topics class - a research paper every week or so. I don't remember anything about parsing there either.
> why books on interpreters and compilers tend to avoid the question of parsing like the plague
They avoid it like old boring anecdote everyone read yet in their childhood in a dragonbook.
> parsing is over-emphasized in other books, and it's truly orthogonal to other PL activities.
Right, "parsing science" != "compiler science". Parsing science isn't even a subset of compiler science. It's just another discipline which touches (someone may even say overlaps) with it. Consider parsing a natural language. (Which btw any human does, but compiler stuff is surely not for a every human.)
It's pretty hard to pick up compiler skills because very little of it is written down. It takes a lot of time (a few years) working with code bases and papers to absorb it.
I smell a contradiction. But I'm glad that even well-known industry people see it like that. It's not a common flu you can pick up at university library. It's an arcane lore you need to travel faraway to dig into ruins of Library of Alexandria to find the knowledge of.
I don't see the contradiction - very little of it is written down, and so you need to spend a lot of time scraping around for the disparate parts that are written down and trying to absorb as much as possible for them. There are very few (in some parts of the field none) books that bring together all the information.
Good, good. Reminds me that stealing your https://rubybib.org/ idea for Python is on my todo list for long time ...
Actually, my own ref to the Library of Alexandria, initially done for a word of mouth, is a literal situation in some parts of the realm. Reminds me I wanted to put up a mirror of Rice' Massively Scalar Compiler Project materials somewhere. In my mental map, Cooper and guys literally saved SSA from IBM's freezing hell (quoting Kenneth Zadeck: "What Happened Next / We stopped working on SSA. / None of us actually worked on a compiler project. / We were blocked from transfering SSA to the IBM product compilers.").
Really should do that on winter holidays. By that time, we'll also know if http://ssabook.gforge.inria.fr/latest/ actually have gone down. But at least that is already mirrored...
Yes, more validation - compilers as taught in universities need improvement.
So, a startup of 10 people should hire and train people for months. Raise a $2M seed round, have 12-18 months to show ProductMarket fit and as a founder teach all engineers their jobs? How many engineers go into compilers after school? So university education is all they know. The industry is built on qualified engineers. NVIDIA ceo was ahead of curve on graphics card, google founders ahead in search, why are compilers ones behind the curve?
I suspect it’s because there aren’t a lot of high quality libraries you can integrate into the backend of compiler tools that don’t run into license issues pretty fast. Imagine if GNU binutils was more permissively licensed and as modular as clang? Then developing novel, non-GPL’d compiler infrastructure could depend on BFD - the boring part that working on won’t bring bonafide improvements to your new compiler. Another factor is that LLVM’s quality and ubiquity has reduced the monetary and technical upside to pursuing new opportunities in compiler development.
Parsing is frontend of a compiler. That the compiler course focuses on compiler backend does not mean it's not relevant to the industry. I personally find aliasing, flow analysis, and and optimization very relevant to what we needed to do.
As for industry vs academia, wouldn't it be fair to say that professors should teach which ever is more advanced? I certainly don't believe that teaching Java generics, no matter how relevant that is, is relevant to a PhD program, which is supposed to push the state of the art of a subject.
> Good luck hiring an engineer with advanced compiler courses knowing any of this.
Have you tried raising the level of compensation you offer? There are engineers elsewhere doing this at other companies you could presumably get if you put your hands in your pockets?
I'm normally fairly laissez-faire about the morality of "exploiting" employees, but that works both ways - if you can't find people for the amount of money you're charging then tough shit
It's not true that there aren't engineers with these skills - there are thousands of them in the US and more worldwide - so if you can't hire them then either:
* you aren't looking (seems unlikely)
* there's something toxic about your team that means they'd never work for you and you should fix that (hopefully unlikely)
* you aren't paying enough to make the move worthwhile for them (seems likely as people never want to pay the going rate)
That's why I presumed salary. Double your salary and you may suddenly find you're being flooded with contacts from the compiler engineers at places like Google. If that's the case then your only problem was the compensation you were offering and not a problem with education.
> We pay very very well, all my peers from NVIDIA are above 700k.
That's... an impressive number.
> Why would you presume anything about salary?
Because that's usually the reason, especially when speaking of what universities teach. "We can't find people who know $TECH" is, under those circumstances, almost always corp-speak for "We can't find new grads who already know $TECH to fill our entry level positions for what we're willing to pay in order to replace the employees that learned it on the job and then leave when we don't give them raises after a couple of years."
Have you posted the positions you're trying to fill here on the monthly "Who is Hiring?" thread?
As an out of the box suggestion, consider sponsoring the creation of an online course (like a Udacity nanodegree, perhaps) that covers the curriculum material you consider missing. A comprehensive set of 'build your own programming language and tooling' courses would probably be quite popular.
Are you hiring massive programming language design & implementation enthusiasts familiar with the whole stack (parsing, type systems, verification, compilation, runtime, GC, ...) including all kinds of different parsing algorithms?
> These courses teach very little judgement or industry relevant stuff.
Hasn't that been the debate for, like, ever?
Should the universities teach you how to learn or should they teach you how to hit the ground running at your first post-graduation job?
I'd say anyone who took phd-level courses on compiler design shouldn't take too much training to become a valuable team member as they've proven they can learn what is important for the given task but maybe that's just my uninformed opinion.
I feel like "teach you to learn" vs "teach you useful skills"is a false dichotomy, and often just an excuse for poor teaching. They can and should do both. The reasom they don't is because they're so research focussed, which is fine as an option but a problem because there are no or vert few good options for high level academic courses that are more practically focussed.
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
Courses should be used to teach general concepts while industry should be used to show those how to apply those concepts. The choice between using a packrat parser, LALR, or LL would be easier to explain to someone who has been exposed to those concepts and understands the advantages and disadvantages of each approach compared to someone who has never heard of the terms.
Best practices change with time, but knowing the base concepts and how to apply them will help people adapt. If they're only exposed to what's considered relevant at this time, then they won't have the necessary exposure to concepts that may pertain to future best practice.
You can hire bright, motivated and mature people and teach them the skills they need. Academia doesn't exist to provide profit-maximizing, ready inputs to business. They produce people who have the skills to (hopefully) be able to grow into any position.
I have some idea what I'm talking about, see my post https://news.ycombinator.com/item?id=25210523 and I don't kniow about why using a packrat over anything else. I pull something off the shelf, antlr for my latest project, then get on with it.
The parser is the most trivial side of things, if you consider that a key skill it comes across as being an issue from the 1970s, not today. Writing a RD by hand is not a big deal anyway as someone else said.
If your into optimising compilers you're focusing on a very odd area to criticise. I'd gave though more about dataflow using lattices or whatever, graph stuff etc.
So parsing aside, what are the 'relevant' skills you feel are missing?
(and what are 'Database Compilers'? Nearest I can think of that may fit are query optimisers)
As someone who is happy when he interviews a candidate who at least knows you don't start parsing with `string.split()` I gotta say, your bar is extremely high. Surely MSFT can afford some internal training?
> I’d like to sit down all university professors who teach compiler courses and teach them a course on what’s relevant.
I'm sorry if you've already answered, or if it comes up a lot, but could you toss up some buzzwords/more industry-standard things you're looking for? I was going to look into this course, and still probably will, but if you're saying it's not relevant I'd like to know what to be looking into instead?
I second this (although I work more on translation and modeling than optimization). IME, it's much easier to find otherwise competent, savvy researchers and spin them onto compiler work than it is to take someone who's gone through an "advanced" course.
>So why not learn it in a week then apply for the job?
For a lot of reasons. I'm not looking to work in compilers. OP sounds like the opposite of the type of people I want to work with. I enjoy what I work on quite a bit. I am paid well. I have the chance to build up the business and a team.
It is just an attitude that I see a lot. The candidate doesn't have XYZ specific knowledge. Find someone who is interested and motivated (e.g. someone who has independently taken an advanced course in a topic) and build them up.
I'm sorry that some candidate can't answer your trivia question, but if it could be taught in a 16 week course (~10 hours / week), you could find a way to convey it on the job (~40 hours / week) in a much shorter period of time.
For this scenario have new hires work through a series of designed problems. Have them implement and run into the pitfalls with LL, LR, and Packrat parsers. Show issues with left recursion, shift / reduce conflicts, space trade offs of Packrat, seriously whatever you want to demonstrate.
Even just write up a document explaining when to use each rather than bemoaning the lack of knowledge. Like how much cooler would the top comment in this thread have been if I had learnt something?
This feels a lot like "I want a candidate with the knowledge of a senior for the price of a recent grad".
This would lead to incredible toxic situations.
If I want to apply to 5 companies and all 5 have 1, but different questions that I could learn in a week. Would you considser it would be more efficient for any party for me to learn this in advance?
Also, this assumes it is known upfront what this required knowledge would be...
This is such an empty criticism. Universities are not job training facilities. You want people who have the exactly right skillset? Hire smart kids and train them, you know, the way it's been done for hundreds of years.
I agree in general. I have a PhD in comp eng and know a lot of academia type subjects, but only a small portion is especially useful in a practical setting. Also a lot of academic knowledge seems needlessly obtuse. It becomes a lot clearer when looking at the practical problems that spurred the academic fields. At times I get the aha as to why this abstract concept is addressing a meaningful question, and if that's how academic education oriented itself, around these basic meaningful questions it would help people learn much more effectively.
Domain-specific compilers are a bigger field, where there's more opportunity to do something interesting
Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a domain-specific program into C
For example, NN-512 (https://NN-512.com) is a compiler that takes as input a simple text description of a convolutional neural net inference graph and produces as output a stand-alone C implementation of that graph
GNU Bison (https://www.gnu.org/software/bison/) is another example, where the output is a parser: Bison reads a specification of a context-free language, warns about any parsing ambiguities, and generates a parser (either in C, C++, or Java) which reads sequences of tokens and decides whether the sequence conforms to the syntax specified by the grammar
Halide (https://halide-lang.org/) is a language for fast, portable computation on images and tensors, and it originally generated human-readable C++ code as output (but not anymore, now it feeds directly into LLVM)
Can anyone provide other examples? (Code generators for network packet processing?)
> Take C (or some other language) and its mature optimizing compilers as a foundation, and write a program that compiles a high-level description of a program into C.
Don't you still need many of the techniques taught in this class to build something like that? The material doesn't look inherently targeted at low-level languages or machine-code as a target.
One way to think of it is to split the "compiler" problem into three big pieces. 1) Front-end, 2) back-end, 3) tooling.
Front end is lexing, parsing, building the AST (in whole or just keeping pieces of it around), semantic analysis. When that is done, you can say "Yup, valid program. It is possible to generate code." So, to your question, yes. Those techniques from the course are 100% in play.
Back-end is turning the AST into something you can optimized and generate code from. (Sometimes there is a "middle-end" in there....) High-level optimizations, low level optimizations, memory allocation, register allocation.
Tooling is everything that keeps your life from being miserable as a user -- starting with debug symbols and these days people expect library and package management, etc, etc.
So if you are interested in exploring the Next Great Syntax or some semantic problems that can be re-written into C, then doing a C-front is a great way to do that. Let the C compiler handle all the really-really-really-hard back-end issues for you, and avoid all that distraction. But.... expect that getting debug symbols from your spiffy language into the ready-to-link object file is going to be, as they say, "non-trivial". So... great for experiments and a great learning exercise, but it's hard to do a production compiler that way.
Hi! This course is about the "middle end," FWIW. We do not do parsing or codegen in 6120, and there is no goal (as there is in many undergrad-level compilers courses) of making a complete, end-to-end compiler for a C-like language.
I have just spent the last 13 months significantly reworking a compiler that turns Excel-like formulas into JavaScript. The compiler is used by an app designer, and the formula language is meant to be easy to pick up for people used to spreadsheets. A significant difference compared to Excel is that our compiler features static typing, enabling helpful error messages to be produced at build-time.
(Spreadsheets, on the other hand, tend to try very hard to make sense of types at runtime, using type coercion to turn values of the wrong type into something of the expected type they can work with. That produces surprising results and confuses users. =IF(2, 3, 4) will generally produce 3, because 2 -- which should be a logical/boolean value -- is interpreted as TRUE. We think it's a better idea to warn the user that the first parameter is wrong, and to do that as soon as possible.)
By far the largest challenge with reworking the compiler has been the type system. I have expanded our type system from the four standard spreadsheet types to encompass more than 80, as well as added support for lambdas (for use with functions like SUMIF and FILTER), type inference, parameterized types and union types. The biggest addition is probably an imperative mode, that executes formulas with side-effects in response to events being fired. This is all in service of features we have wanted to add for years, that have been held back by our (previously) simplistic type system.
I did take a compiler class at university, more than 15 years ago, but while that proved very useful when writing the formula parser (an operator-precedence parser, initially straight out of a textbook), type theory was not part of the class. I had to pick up bits and pieces by reading the Java Language Specification (JLS) and by experimenting.
The Cornell class doesn't seem to feature type theory either, which I think is unfortunate. Now that the pendulum has swung, yet again, firmly in the direction of static typing, perhaps static typing will become a bigger part of CS compiler curriculums.
Your project sounds like fun. What/for whom is the project? What level of Excel formula compatibility are you shooting for, e.g. just in the neighborhood like Google Sheets, reasonably close as in Libre Office, or full compatibility with the latest version? If the latter you must have one gnarly test suite. Would love to hear more about it.
The compiler is for Calcapp, an app designer for Excel-savvy users. Our aim is to offer a function library that is easy to pick up for our target audience. That means that we generally follow Microsoft's lead in terms of naming and parameters, but we often offer extensions. Here are a few examples:
* Functions like SUMIF operate on a range or an array and use a textual mini-language to determine which values to operate on. =SUMIF(B1:B2, ">3"), where B1 contains 2 and B2 contains 4, returns 4, because only B2 satisfies the ">3" condition. We're not big fans of that mini-language (which expects you to use the string concatenation operator & to handle dynamic values). We support it, through an overloaded function, but also offer a version which takes a lambda as the second parameter. In Calcapp, SUMIF({ 2, 4 }, Item > 3) and SUMIF({ 2, 4 }, E -> E > 3) (with a named lambda parameter) are equivalent to SUMIF({ 2, 4 }, ">3"). The lambda syntax enables much more complex conditions to be expressed.
* Some functions like SORTBY take a sort order parameter, which is an integer where 1 means sort in ascending order and -1 means sort in descending order. We support functions with integer sort order parameters, but also offer an overloaded version taking an enumerated value instead (SortOrder.ASCENDING or SortOrder.DESCENDING), which we think helps with readability.
* Excel's logical operators (like =, <> and <=) return arrays when applied to arrays and ranges. For instance, { 1, 2 } = { 1, 3 } returns { TRUE, FALSE }. (This is easier to see in recent versions of Excel, where array results "spill," meaning that a return array with two elements spills to occupy two cells of the grid. In earlier versions, as well as Google Sheets, you'll have to use INDEX to tease out the results.) We support = and !=, but also == and !=, where the two latter operators always return scalar (non-array) values. == enables users to determine if two arrays are identical, without having to use AND on the result array (which is the standard Excel trick). Also, our == operator is more traditional in that string comparisons are case-sensitive (unlike =).
* Experienced Excel users like to use double negation to turn logical arrays into number arrays. We have a specialized operator, --, for that, because allowing two unary operators to follow one another would introduce other issues. Here's an example: https://exceljet.net/excel-functions/excel-sumproduct-functi...
* The Excel operators * and + can be used with logical arrays and ranges to signify logical AND, as well as logical OR, respectively. (https://exceljet.net/formula/sum-if-cells-contain-either-x-o...) We have overloaded versions of those operators that take logical arrays for that reason. (Again, Calcapp uses static typing, which makes this a lot more complicated for us than it is for Excel.)
* The latest versions of Excel have great support for dynamically-allocated arrays (along with new array functions like FILTER and SORT). Their new calculation engine supports users providing arrays where, traditionally, non-array parameter values are expected. If arrays are provided for these parameters, then an array is returned. For instance, =SQRT({ 4, 16 }) returns { 2, 4 }. Microsoft internally calls this "lifting" parameters. We refer to these parameters as being "array-capable." In our type system, an array-capable type is a union type (T|Array<T>), with the notable difference that if an array is provided to an array-capable parameter, then the return type is no longer T, but Array<T>.
We do have a test suite for the runtime formula function implementation with a couple of thousand unit tests. Given the number of supported formula functions, though, we should probably have many more.
Most of what I have described here pertains to a new Calcapp version, which hasn't been released yet. In particular, while I am done implementing the compiler, it still needs to be documented (lots of Javadoc), user-facing documentation needs to be written and the runtime system needs to be implemented, so we still have a few (fun!) months to go before everything is ready.
AnyDSL (https://anydsl.github.io/) might be worth mentioning, it is a framework for creating domain specific languages using partial evaluation to give powerful abstractions to the user while still producing high-performance code for different target platforms.
On the website, you can find that they already have DSLs for ray tracing and image stencil codes as well as works for bioinformatics.
Not open source but I'm working on a system that turns a DSL (embedded into Jupyter notebooks) into either c or c with opencl for a niche finance application. Not a huge project (<10k lines of code). Nature of the problem means that DSL can be very simple (it's aimed at non-programmers). This approach also means that could easily add other targets - e.g. cuda or wasm.
I find the presentation style of subtitles with surrounding context extremely helpful. I often lose the ability to listen when concentrating; the subtitle context provides a superior recovery mechanism to the usual, try to work out what was missed. I look forward to purchasing augmented reality glasses at some point in the future which add this advantage to in-person interactions.
It’s the complete opposite of, and significantly more user friendly than, how zoom recordings synchronize meeting subtitles with audio/video playback.
In their case they only show the current line in a text message-like sidebar. As the content plays, it snaps the scroll position of the messages to current, hiding the rest. This breaks scrolling ahead or back while listening and you lose any context.
Hopefully I’m just missing a setting - I try to minimize the time I spend in there. Is there a way I can take their damn horse blinders off?
I too appreciate the subtitles and I tend to have them on when I watch+listen to a DVD. You can even click on the sentence and the video jumps to that point. However ... a little proof-reading would help here. To get things started gets transcribed into Ticketing, sir.
“Real 6120 also has an end-of-semester course project—in the self-guided version, your end-of-semester assignment is to change the world through the magic of compilers”
I wish they’d be more courses on compilers that also serve as language servers. The current world is moving towards a phase where a language needs IDE tooling such as compilers, linters, formatters, REPL, language servers, debuggers and profilers.
Would love to see a high level course that covers how to build the tooling for a great developer experience.
As i mentioned in another thread, a uni course that has students work on an existing codebase instead of writing something from scratch might be even better - though harder to grade.
Imagine if one could do a major feature contribution to clangd/pyls/rust-analyzer as part of their advanced compilers course. The student gets a better understanding of (open-source) software development practices, gains a deeper proficiency in the language (not just the APIs), the project brings new contributors and users (including the student herself) benefit from the new feature.
My uni, NC State, has a course (CSC 326) called Software Engineering that has students contribute to an existing project created by the staff in teams, with the best contributions being merged into master for the next semester students to work on.
Same here in Munich, we've had a course on software-testing in OOP-languages and had to write unit-tests for some modules of jenkins which were graded and eventually merged into master.
Sorry - should've been more clear. It's a project written by the staff in Java and Angular. It's been used continuously through the years, evolving each semester with the best contributions being merged. I believe it's a health management service, I haven't taken the course yet.
If you have to stop "early" please get through the simple DCE and LVN passes.
A solid basic-block level DCE & LVN optimization along with a judicious "compile time evaluator" (simple constant folding) will let you produce code at around 80% of the level of LLVM.
Obviously, you're not going to get the magic that comes of LLVM's scalar analysis pass, but you're not going to be embarrassed with your performance, either.
I'm interested. I'm writing a compiler-interpreter now but my broad compiler-construction chops are spotty and going through this systematically might be useful.
I would love to do so! I'm a masters student who has been wishing there was an advanced compilers course in my program, there's only the undergrad level one, so this is perfect! Let me know how I can get in contact.
I’m interested. I may not have the most relevant background, though — I’m a data scientist with an econometrics education. The topic really does interest me, though.
Okay, I just created this on discord https://discord.gg/RBPmmdWg. Considering it might be more accessible for people. Wondering if I should post this out of the thread.
Crafting Interpreters has been great. It starts with an AST-walking interpreter (in Java), so you get to focus on parsing and the simplest form of interpretation, and then it moves to a bytecode interpreter (in C) so you get to learn about compiling to a more primitive target via a lower-level language (much of which should translate to a "real" compiler). Very approachable, has lots of side-context and an entertaining style. The language you interpret was custom-designed for the course, which helps a lot, and resembles a simplified JavaScript without all the messy parts.
Particularly (in alphabetical order--I think these are all great, so including highlights of what I've liked about them):
- IU P423/P523: Compilers (Programming Language Implementation) - Jeremy Siek, with the course book "Essentials of Compilation: An Incremental Approach" (pretty interesting approach, with programming language features developed incrementally having a fully working compiler at each step, cf. http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf; implementation language Racket),
- KAIST CS420: Compiler Design - Jeehoon Kang (good modern treatment of SSA representation itself, including the use of block arguments, https://mlir.llvm.org/docs/Rationale/Rationale/#block-argume..., as well as SSA-based analysis and optimization; Rust as an implementation language),
- UCSD CSE 131: Compiler Construction - Joseph Gibbs Politz, Ranjit Jhala (great lecturers, both Haskell and OCaml edition were interesting; fun extra: one of the Fall 2019 lectures (11/26) has an interesting discussion of the trade-offs between traditional OOP and FP compiler implementation),
- UCSD CSE 231: Advanced Compiler Design - Sorin Lerner (after UCSD CSE 131: for more on analysis & optimization--data flow analysis, lattice theory, SSA, optimization; fun extra: the final Winter 2018 lecture highlighted one of my favorite papers, https://pldi15.sigplan.org/details/pldi2015-papers/31/Provab...),
- UW CSE CSEP 501: Compilers - Hal Perkins (nice balanced introduction, including x86-64 assembly code generation, with the aforementioned "Engineering a Compiler" used as the course textbook).
Thank you so much for this comment. It's incredibly difficult to find modern compiler learning resources – when I ask, I often get disparaging comments about the dragon book and not much more.
Please, consider putting this list on your blog, or somewhere more visible than a HN comment.
How's that possible? It's a "keep it under your pillow" book. Whether you read it or not is a different question. I don't know anyone in the 'hood who wouldn't have.
I took the undergraduate version of this course while at Cornell. It was a wonderful course. I taught me a lot how computers work, and how to organize a large (for me at the time) software project.
One of the most painful parts of the the project was using Scala which had ridiculously long compile times which made iterating quite a pain.
My favorite memory was that for one project we had to submit a program written meant to stress test everyone's parser.
Cornell student here that's taken a bunch of classes at all of the levels - in my experience, undergraduate classes/master's classes are kind of taught with the expectation that you're taking the course as a requirement and are much more evaluation heavy, and that you're likely taking a full course load of other courses as well. PhD level courses are generally assumed to be the only course that you're taking that semester and that you're taking it more because the material will be useful for your research rather than for the credential, so there's much less emphasis on evaluation; i.e. a smaller quantity homeworks that each much harder than an undergraduate version of the same class, and final evaluations are much closer to reading recent research papers and understanding them rather than a timed exam.
> the difference between masters and PhD has nothing to do with classes.
huh ? what's your education system ? in europe it's generally 3 year bachelor, then 2 year masters, then 3 years phd so when you have classes during your phd hopefully they are different that the one you get during your masters
Depends on the field. I studied philosophy (Ph.D. program, but finished with an M.A.), and the M.A. is often just a degree given after the first few years of the Ph.D. There are a handful of required first year courses (logic, basic metaphysics, philosophy of science and ethics), but otherwise students take whatever courses are relevant to their interests. It helps that philosophy typically has fewer classes with explicit prerequisites. If you take an advanced course, you might be lost, but it doesn't feel the same as not being knowing a formula or how to derive something.
Some courses are more basic, and some are more like collaborative research seminars. Advanced students gravitate towards the latter (and typically take fewer courses and/or audit more of them), but a first year with the right background can still take them. Conversely, an advanced student might take an introductory course in an area they lack background in to broaden their knowledge.
There also are terminal M.A. programs, but they're less common, and the majority of students who do Ph.Ds at prestigious institutions don't do one.
I am in the US, and at my school masters and PhD students all take the same classes.
It is not ideal because many of the masters students are coming from a non-CS background. Some of the graduate level CS courses I have taken were easier than 4th year level undergrad classes.
By the time I'd finished graduate classes, I had a hard time with proofs. After the 10th or 12th introduction to predicate logic with slightly different notation, it just all becomes a big, confusing mass.
I honestly have never figured out European education systems. Sorry!
But this is Cornell, which I assume is like other research-oriented US universities. You have a 4 (or maybe 5) year undergraduate, leading to a Bachelors and then apply to graduate programs. The graduate schools may have specific Masters tracks, but not really any general Masters program. Instead, you spend the first year or two taking graduate classes until you either satisfy the assorted requirements or pass an oral qualification (They still have those, right?), after which you focus entirely on research, publications, and your dissertation defense. If you leave school before defending, you get the Masters as a sort of consolation prize---most US PhDs in my experience don't have Masters.
In the US, it's typically a 4 year undergrad, 2 year masters, and a ~5-7 year PhD depending on the program itself. From what I understand, while there are some classes you take as a PhD student, a majority of your work is doing research
For PhD students, the masters is integrated into the 5 years they spend doing their PhD. They are expected to take courses as a "pre-candidate" and do primarily research after they pass their qualifying exams and become candidates.
This is usually true but not always. I attended an economics program with a terminal master’s offering, and we spent a year and a half as master’s students before moving on to PhD work.
Just for completeness sake. In the EU a bachelor's is 3 years, a master 1 or 2 (2 being the standard) and a PhD 3-5 years. It depends on the country, so it's possible to get 3+2+5.
There is no formal difference between masters and PhD courses. I think he calls it a PhD level course because it is "research oriented" and requires work that perhaps a Masters student looking to complete credits might not be interested in.
The evaluation criteria is also fairly subjective and rough, which might not work for a larger masters level course.
Funny you mention 6502, because as wonderful as its assembly is, its not that friendly for higher level languages. I tried writing a basic C like language for it with 16 pointer support (see LDA ($ZP),Y), and came to the haunting realization that function calls require pushing the de-reference of zero page pointers to the stack which change their de-referencing address space from 0-255 (eg. main) to 256-512 for subsequent stack frames.
I then realized that at a speed of 1MHz, that the pointer de-referencing, the stack frame magic, and endless copies to just parse basic expressions just made 6502 a pointless target for a C like language.
Question for students: how do you find synchronous courses with some discussions during class, vs asynchronous courses where discussions are limited to an online written forum or office hours?
Asynchronous discussions are durable as content (in the internet). They are also very useful for working students. But they need focus to retain them in memory.
I'm interested by the topic but I'm a junior backend/data eng, do you think it could be helpful for my profile to dig into the concepts of compilers other than for the culture itself?
There are various weird and wonderful niche languages out there, if you appreciate programming languages as an end in themselves. Two examples: Sentient [0] and Factor [1][2].
> The MiniSat solver is Sentient’s default. It is a version of MiniSat that has been compiled to JavaScript with Emscripten. This means that it can run in a web browser and it does not have any dependencies when run from the command-line.
The simplest case where they prove useful is usually when your regex is longest than 20 characters. Write a parser once, and it'll be readable to people other than yourself (if it isn't your schema is bad)
Seems a pity to throw out the various advantages of the high-level solution just because your regex engine doesn't support a scalable syntax.
Advanced regex systems such as the Ragel parser-generator support a composition-friendly syntax, [0] but the average regex system doesn't, and I guess there's no quick fix for that.
A (not)compiler can be made to look for issues in your codebase that already exists. It would need all the tools of a compiler, just that the target is the same as the source language.
It looks like its more about generating IR. The professor wrote his own LLVM like tool called Brill, and use of the tool seems to be the focus of the course.