To me it's fascinating that you indeed _can_ speed up what appears to be a non-p...

dzaima · 2024-06-13T19:30:09 1718307009

There's a surprising amount of things that are possible to parse in a parallel fashion if one tries hard enough, though keeping significant or any gains is more tricky. The operation in the article though is only a rather tiny part of parsing HTML though.

floxy · 2024-06-13T22:03:15 1718316195

Can anyone recommend pointers to find out more about creating programming languages / markup languages that would be more parallel friendly for parsing? Or maybe even embarrassingly parallel? How does that affect the grammar, etc.. Maybe you need a different formalism rather than BNF do describe it, and you don't use regex to tokenize, and there is no recursive descent, or state machines in parsing, etc..

dzaima · 2024-06-13T23:26:49 1718321209

There's not really that much specific to change I think; the general thing is just reducing the number of rules, as in a parallel thing you'll be applying every rule to every part versus just the rules given preceding context (there is the option of filtering parts to process if there are a bunch of rules applying to the same small subset of the input, but that requires such cont).

All I can point at is from the array language world, with the notable thing of Co-dfns from Aaron Hsu, an APL compiler capable of running on the GPU (pointless as that may be): https://github.com/Co-dfns/Co-dfns/tree/master; and a 284-page dissertation: https://scholarworks.iu.edu/dspace/items/3ab772c9-92c9-4f59-...

And Marshall Lochbaum following (https://mlochbaum.github.io/BQN/implementation/codfns.html) that work, and covering some basics of parsing expressions of infix & prefix ops to stack-based/RPN at the bottom of https://mlochbaum.github.io/BQN/implementation/index.html, though the code is in BQN.

From that same author, parsers written in BQN, that I think should have sub-linear critical paths:

Markdown: https://github.com/mlochbaum/BQN/blob/master/md.bqn

XML: https://github.com/mlochbaum/bqn-libs/blob/master/xml.bqn

and a compiler of BQN itself: https://github.com/mlochbaum/BQN/blob/master/src/c.bqn (outputs stack-based bytecode).

floxy · 2024-06-13T23:45:44 1718322344

Are you still operating on a single stream of characters? I was wondering about something more radical, like you start in the middle of the stream, and one thread parses the forward to the end, and the other parses backwards towards the beginning.

dzaima · 2024-06-14T00:10:40 1718323840

That's an option for a potential 2x boost, which I suppose could be good enough in certain contexts, but that's it. In particular, if you have a parser system able to utilize SIMD for all parsing, that'll get arbitrary parallelizability for "free" (at the cost of some log(n) increase in total operations across all threads).

magicalhippo · 2024-06-13T20:22:41 1718310161

Now all we need is for pre-2000s web to come back and we could have instant web surfing.

anonymoushn · 2024-06-14T03:02:34 1718334154

The approach to these vectorized parsers is generally to find some part of the parsing task that doesn't have a dependency chain through every dang byte, like a naively written parser would, then do that first, then do some harder stuff after. I'm pretty used to it now so I lack perspective on whether this is weird, but in general finding long dependency chains and getting rid of them or replacing them with several shorter dependency chains will make software a lot faster.

emn13 · 2024-06-14T08:53:12 1718355192

As a very rough approximation, I suspect that html parsing (beyond the streaming of the data anyhow) isn't really all that fundamentally serial. After all, if you seek into an html document at some random point, you can generally interpret the html structure mostly locally, i.e. it's likely possible to subdivide the task of parsing an html document.

Given how leniently html deals with errors and that some of the parsing rules have some context sensitivity (the various modes) actually exploiting large-scale parallelism is probably a very difficult bit of engineering. Additionally, there may be corner cases that have much more context dependence than normal. Also, parsing is already probably not a huge part of a browsers runtime, and aggressive parallelism might be faster but more power hungry for a very small latency win. But I can't think of a specific fundamental limitation that prevents the normal parsing case from being quite parallelizable; I believe it's "just" an engineering problem.

bssrdf · 2024-06-14T13:29:56 1718371796

Indeed. Even a simple 'wc -l' type operation can benefit from SIMD [0]

[0]https://x.com/cloud11665/status/1799534538873290993