Hacker News new | past | comments | ask | show | jobs | submit login
Anders Hejlsberg on Modern Compiler Construction [video] (msdn.com)
191 points by ingve on May 12, 2016 | hide | past | favorite | 40 comments



Anders Hejlsberg wrote the first mass-market Compiler+IDE with Turbo Pascal. What was unique is that TP would automatically bring up the editor and position the cursor on the offending error. Seems trivial now, but it was a game changer for writing code on a PC. TP sold for around $49. The competitor, Pascal MT+, sold for $400.

I doubt if Anders would remember me, but I was lucky enough to be contracted by Borland to write their first test suite for TP 4.5. It was their first object oriented language. The spec was one of the most beautiful and concise pieces of technical documentation I've ever read.


TP was indeed amazing, though I think you're misremembering — TP 5.5 was the first version to add OO extensions.

It came with a separate little manual just for the object-oriented stuff (beautifully typeset with the same type of syntax flow diagrams that Apple did [1]; Borland's OO extensions came from Apple Pascal, after all), and I remember reading it from cover while on vacation when I was in my mid-teens, and having my mind blown by the ideas of object-orientation, which seemed like complete science fiction at the time.

Anything from your recollections you can add to this? Was there any collaboration with Apple at all?

[1] http://pascal-central.com/images/pascalposter.jpg


Yep, my bad regarding version; was relying on faulty wetware instead of a web search.

Not that I was privy to inside information; but it was a strange coincidence that Borland dropped its Basic compiler at the same time that Microsoft dropped its Pascal compiler.

Apple had a pretty cool OO Pascal that somewhat resembled Xerox Parc's Mesa.


There were many wonderful things about TP, but it started much earlier than that. BLS Pascal on Nascom 2, COMPAS Pascal on CP/M, Turbo Pascal on CP/M, and finally TP on DOS. I met Anders at Herning Messen where he demoed COMPAS Pascal (running a Maze generator). From my recollection he was a few years older than me which suggests he wrote BLS Pascal as a young teenager o_O!

This thread has a big of this history: https://news.ycombinator.com/item?id=10202299


Last month I recovered an old backup tape with TP.EXE. Couldn't resist but to play with it in dosbox.

http://imgur.com/oT3u4fR

It really was a brilliant piece of software. 600KB.

ps: the text GUI shadows.


Turbo Pascal 5.x and onwards were implemented using Turbo Vision, a really good, object-oriented UI framework [1]. Every view could render a sub-rectangle of itself on command into a buffer, which is how the UI was able to render drop shadows. The C++ version was open-sourced in 1997 and ported to a bunch of platforms including Linux [2].

[1] http://www.sigala.it/sergio/tvision/html/index.html

[2] http://tvision.sourceforge.net/


Ah yes, text mode graphics! I spent a few thousand hours writing a text mode version of Augment, with TP. Amazing what you could do with 80x25x16 colors.


In terms of ergonomy, that amount of text was very nice. As a long emacs user, I felt straight jacketed by their keyboard shortcuts.

ps: really, these pseudo alpha transparent text shadows ...


I always fancied the EGA text mode of 80x43x16 colors. Much prettier than 80x50.


To assist you all in reminiscing? A lot of TP manuals are at bitsavers:

http://www.mirrorservice.org/sites/www.bitsavers.org/pdf/bor...


As Anders said near the end of the video, if you want to know more, look at the source code[1]. Speaking as someone who's worked on it (so I'm biased), I feel it's pretty easy to jump in and edit (though I'd advise new people to avoid the typechecker unless you feel particularly adventurous, it's a multi-thousand line long monster file). There are piles of easy issues[2] that are looking for community members to work on them.

By the way - one of the most meaningful comments he made in this talk (to me) was when he said that your parser had to "parse more than the allowed grammar" so you can provide better error messages. Compilers are, ultimately, tools for developers - so developer experience is tantamount. This, I've found, is so very very true in any smaller projects I've worked on, and is easily one of the first things neglected by some of my more algebraically inclined peers (who are very satisfied with a perfect ABNF and a parser which strictly adheres to it).

[1] https://github.com/Microsoft/TypeScript [2] https://github.com/Microsoft/TypeScript/labels/Effort%3A%20E...


I've developed a couple of cool tricks to get very error tolerant parsers (I design/build live programming languages). We can go pure shunting yard (precedence parser), which will basically parse anything since it doesn't rely on grammar rules. Even if going with grammar based parsing (they are convenient), braces can be matched on tokens in a pre-pass before parsing occurs, eliminating an entire class of difficult to deal with errors, and allowing for brace stickiness in an incremental edit context; no need to rebalance because someone typed an opening paren without brace completion!

Though I can't help but think that someone will eventually develop an NN-based PL parser that will be much more error tolerant than straight grammar-based implementations could ever be.


Do you have those tricks written up anywhere?


Not really. I haven't taken the time to do any analysis given that they are always details in the languages I'm building (which are written up). I gave a talk about this at Parsing SLE (2013?) but I guess it didn't need a PDF.

They are really simple ideas (really, pre match your braces, I'm sure I'm not the first one to do that!), I'm not sure they are publishable.


I would like to see a codebase where this idea is used. I am playing around shunting yard ...


This is a fantastic overview. If you've ever wondered why JetBrains build specific editor support for each language they support (including, effectively, a compiler), this is why.

As the developer of an IDE for Clojure, I'm also very happy that one of the secret sauces is persistent data structures.


As a quick aside, Anders had committed code the same day to the TypeScript compiler. He also is in a team room with like 20 devs (not in his own plush window office). He told me he loves that kind of environment. The dude is a really cool dev.


This is a very good talk but I wonder if this alternate compiler design has actually made the TypeScript compiler slower in normal compilation mode. If you really do "helicopter in" to every point in the syntax tree and run IDE queries to implement type checking then that could potentially be much slower if there's any overhead at all to doing that.

I've been experimenting with programming languages and compilers myself (https://github.com/evanw/skew) and my compiler appears to be ~2x faster than tsc when run with node even though it's also doing constant folding, tree shaking, inlining, and minification while tsc is just removing type annotations (my compiler appears to be ~15x faster when run natively). The slow compilation speed of tsc is my main complaint about the TypeScript ecosystem.


If you run tsc with benchmarking on, you'll find that almost all the time is taken up by the typechecker. Increasingly, TypeScript is adding control-flow-analysis-based[1] and contextual type system features... Saying that it just 'removes' type annotations as a bit disingenuous, as processing the type constraints those annotations apply is one of the most time consuming tasks TS can perform! TypeScript is structural, too, so for every comparison it can need to get down to the nitty-gritty of comparing individual object members... Recursively. TS does heavy cacheing of not just types, but the relationships between types - just to avoid redoing any work, where possible.

FYI, the 'helicoptering in' doesn't affect full-program compilation time because of the mentioned cacheing. Since the state of the world is unchanged during a build, the parsetree and type hierarchy are only built once. Additionally, you'd find that incremental builds (tsc --watch) are relatively snappy because of the heavy cacheing done and the tree reuse, and so has benefitted from the architecture improvements for service work.

[1]https://github.com/Microsoft/TypeScript/pull/8010


The compiler doesn't just remove the type annotation, it has to go through and check that the annotations are valid and that the type safety is kept throughout the program. Type checking is often not a quick process, since it requires every single value in the program to have its type verified. If I declare that a variable is an integer and set it to the result of the function, the compiler has to make sure that that function returns an integer and not some other type, or that the type that function returns can be implicitly converted to an integer. And for that to be safe, it has to first prove that the function being called can be given that type, etc.


I know what type checking is. :) Both Skew and TypeScript are type-checked languages. Just because a compiler does type checking doesn't mean it has to be a lot slower. I was just pointing out that it would be interesting if this alternate compiler design was the reason why the TypeScript compiler is so slow relative to another compiler for a similar language (object-oriented, typed, garbage collected, etc.) that also uses the node runtime, especially since that other compiler is doing even more work than tsc.


Are there books or more developed material on these strategies? I understand the concepts mentioned here but reading some implementation strategies would be helpful.


The C# compiler he is taking about is open source, so you could have a play around with it to see how they implemented it. https://github.com/dotnet/roslyn


Is it common to see people at Anders's age so enthusiastic about CS as he is?


Well he clearly has a curious mind... I think you only get burnt out once you can't keep up, or once nobody is interested in the things you spent your career on.

But he happened to spend his career on something that is still profoundly interesting to people. The open source world is very much in the old architecture he was talking about, so it appears that Microsoft compilers are simply more advanced. (And Borland and Jetbrains did this too, just not open source compilers)

There are several programmers at my company who are in their 60's, and I think even 70's, past retirement age. All of them are super sharp; they are not "code monkeys". I think those types tend to leave the industry sooner.


He's 56. What's your point? Do you imagine that people who devote themselves to a field have some kind of expiry date built in?


Our expiry date is not har coded. It is a nondeterministic finite state machine.


CS is like mathematics or chess - if you find it interesting or beautiful there should be no expiry date as such.

What burns people out, IMO, is lack of autonomy or broken work places, not the subject matter.


Certainly common in my experience. A coworker of mine just celebrated his 57th birthday. He's definitely better at this stuff than I am; experience counts. The youth bias in this industry has always been largely based on illusion.


I think this applies to many people in our domain, provided they're careful not to burn out: https://popperfont.files.wordpress.com/2014/04/lifesatisfact...


It's very common for professors and researchers, which I guess he kind of is.


He clearly is not JavaScript fatigued.


It looks like they are building syntax trees in a similar way to how React is building the DOM tree - using functional programming and caching/diffing.


So it's not that the classic compiler architecture that had to change -- it's the addition of languages services that broke the existing model.


If you look at how compilers are written these days, one other thing has changed. Almost every book on compiler design, including the classics, will direct you towards separating the lexer, scanner and compiler stages using parser generators like Lex/Yacc or Flex/Bison. In practice, this isn't done anymore. If you look at modern compilers like Clang, Go or Rust, they all have hand-coded parsers that build syntax trees right out of tokens. It turns out that parser generators don't really give you that much, and only get in the way once you need to go beyond the basics.


The classic compiler architecture did change, unless you want to maintain two compilers. That's the main point of the talk.


Yes, I'm just saying that the requirement was external to the original function of compilers.


Sure... they mentioned the fact that the TypeScript compiler barely generates code, if that's what you mean by the original function of compilers.

The architecture is completely dominated by the front end for usability and incrementality now. Generating JavaScript after all that is trivial.

You can imagine that during a typical program's development lifetime, 99% of the time is spent with the program in a non-working state, and 1% is when it's working. The compiler has to help with the 99% case now, not just the 1% case.


Typically langauge services are decoupled from compilers. The 1% case still dominates how compilers functions. Furthermore, typically compilers are not deamons and build systems would take care of caching etc. Do you have other examples of consolidated tooling like that? It seems like a good idea but I don't think it's the standard thing yet or that everyone thinks it's a good idea.


Yes the model of programmer workstation as envisioned by Xerox PARC and followed up by ETHZ.

Smalltalk, Interlisp-D, Mesa/Cedar at Xerox PARC, followed by Niklaus Wirth work at ETHZ with Oberon, Oberon-2, Active Oberon and Component Pascal environments.

Also the initial Ada Workstation developed as the first product from Rational Software and Eiffel development environments.

The Energize C++ environment created by Lucid after pivoting from Lisp Machines.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: