Hacker News new | past | comments | ask | show | jobs | submit login

Does the C# parser implement incremental parsing by just using a recursive descent parser with caching, or does it do parsing with nonterminals as lookahead?



It does a full lex and parse. Then if there is an edit, it determines based on the edit which tokens need to be re-lexed; maybe we went from "[10,20]" to "[10.20]" and so now the [ and ] tokens are fine but everything in the middle is changed.

So we go from a data structure representing "original text plus edit" to a data structure representing "original lex plus changes". Now we have the information we need to do the same to the parse tree, which has also been stored. Given the set of tokens in the program which changed, and knowledge of where the textual boundaries are of every parse node, we can restrict the re-parse to the affected syntax tree spine. In the example given, we know that we've still got, say, an array index list but the contents of the list need to be re-parsed, so we re-start the parser on the left bracket.

The algorithm that does this is called "the blender", and reading that code makes my brain feel like it is in a blender. The code was written by Neal Gafter and based on his PhD thesis in incremental parser theory.

The source code is available on github; do a search for "roslyn" and you'll find it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: