Wow that still sounds really long for simply tokenizing a file? I worked on pars...

jakear · on March 28, 2020

I don't recall the exact numbers to be honest. I know the original was in many seconds, and in the end it was sub 1.

As mentioned in another comment:

The go extension is a thin wrapper around standard go tooling, we weren’t tokenizing ourselves just converting between their tokens and ones we could process; a large part of that was converting from byte offsets to UTC-8 character offsets.

The quadratic behavior was a bug caused by reconverting segments over and over again instead of converting deltas between previously converted subsegments.