Off-main-thread HTML parsing in Servo

vmarsy · on Aug 24, 2017

I really liked the blog post and was interested to see the end results but...

> I hope to publish another blog post describing [speculative parsing] thoroughly, along with details on the performance improvements this feature would bring.

Isn't there any preliminary results on the perf improvements this would bring?

I understand that the % of time when the speculative parsing succeeds (instead of having to roll back to a sequential parsing) can't be know easily, since it'll be based on the testing of a large number of real-world web pages,

but I'd love to see just 2 simple examples:

* One full of `document.write`, so we can have an idea of the time lost in case of failed speculation

* One 'embarrassingly parallel' dom tree where the speculation should pay off the most.

That would give us a good idea of worst case / best case results.

jgraham · on Aug 25, 2017

Presto (the old Opera engine) had a mode called "delayed script execution" which would speculate even harder than the gecko/servo approach. As I understand it — which might well be incorrectly — it would continue treebuilding after a script tag, and then if the script ran a document.write it would check if it produced something that wouldn't affect subsequent tokenisation (or, presumably alter the treebuilding state) and if so patch the tree in-place rather than throwing it away and restarting.

It was never debugged enough to be turned on on desktop, but I think it was considered useful enough to ship on mobile in the days when mobile performance was significantly worse than it is today. In particular, on a network where request latency is high having to pause treebuilding for scripts to download can significantly increase the time until it's possible to paint anything at all on the screen.

gsnedders · on Aug 25, 2017

My memory of DSE is that it parsed the entire document prior to doing any script execution; it didn't merely run the script when it became available. That said, I probably touched it only marginally more recently than you.

kevingadd · on Aug 25, 2017

With things like this sometimes they do the work without preliminary perf numbers because other browsers have already done it successfully. IIRC, at least one production browser already does off-main-thread parsing.

Touche · on Aug 25, 2017

Do you know which one? Is it speculative parsing like this one or does it stop at scripts?

bla2 · on Aug 25, 2017

Pretty sure all WebKit-based browsers (including chrome) do this. I'd guess they block if needed, but I'm not sure.

esprehn · on Aug 25, 2017

fwiw WebKit never turned on the threaded html parser (and deleted it after the fork [1]) and Blink removed it a bit ago too [2]. In Chrome we measured it across a number of sites and found that the time saved from background tokenization wasn't benefiting real world content enough to justify the cost and that in some situations it actually made things worse.

That's not to say Servo shouldn't try it of course! Part of having a healthy multi-browser ecosystem is each browser trying lots of ideas and coming up with new solutions to problems that were encountered in other implementations.

[1] http://trac.webkit.org/changeset/162260/webkit (the number cited here is a joke :P)

[2] https://groups.google.com/a/chromium.org/forum/#!topic/loadi... (see also the design doc linked from the email)

krzat · on Aug 25, 2017

I guess one of goals of Rust is to make these sort of optimisations viable (compiler provides safety nets). Will be nice to see what they can do.

hsivonen · on Aug 25, 2017

Gecko has had off-the-main thread HTML parsing since Firefox 4. Docs: https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_...

WebKit/Blink put fewer parts of the HTML parser off the main thread, so negative results should not be taken to apply to more comprehensive approaches.

ibotty · on Aug 25, 2017

I don't get the joke. Were these defines never used? Did the commit removing the threaded html parser happen before?

bla2 · on Aug 25, 2017

I'd wager the feature required fewer than 8.8 million lines of code :-)

pcwalton · on Aug 25, 2017

Gecko/Firefox.

DiabloD3 · on Aug 25, 2017

So, anyone try Firefox Nightly lately, with Stylo finally landed in there? The browser actually feels fast now, I think I've finally made my decision to dump Chrome.

Garbage · on Aug 25, 2017

I have been using Stylo for some time now. I have faced exactly zero problems. I think it's rock solid. I don't think it's the default CSS style system though. You can enable it using layout.css.servo.enabled to true in about:config

KitDuncan · on Aug 25, 2017

You can already enable this flag and Firefox stable, but I don't think it takes effect yet.

NiLSPACE · on Aug 25, 2017

It only works in Firefox Nightly. The flag exists in Beta and Stable, but since Stylo isn't compiled with these versions the flag doesn't actually do anything.

infogulch · on Aug 25, 2017

Are you sure? I thought I saw a FF dev mention right here that the flag is available in stable just a couple days ago.

DiabloD3 · on Aug 25, 2017

From what I can tell, Beta 56 has it (but you have to enable it), Nightly 57 has it (https://bugzilla.mozilla.org/show_bug.cgi?id=1330412 is tracking when it'll be enabled by default), and Stable 55 does not have it at all (although the pref might be there).

leshow · on Aug 25, 2017

I've been a chrome user since it came out pretty much, I loved it. But I also love Rust, so I gave it a try when it came out in nightly. FF definitely 'feels' snappier and I've transferred all my bookmarks and everything over to FF sync without issues.

lewisl9029 · on Aug 25, 2017

Last time I checked, the Servo team also had plans to execute JS off the main thread and introduce async versions of DOM APIs to enable better parallelism: https://news.ycombinator.com/item?id=9011215

Servo is by far the technology I'm most excited about as a frontend developer. It will be a complete game changer for web performance once all the pieces are in place.

pcwalton · on Aug 25, 2017

There isn't really a main thread in Servo, because WebRender, layout, and JS/DOM per origin all live in separate threads.

tmzt · on Aug 25, 2017

Would you consider exposing an RPC interface to JavaScript and/or WASM to enable faster vdom operations?

Also, after reading that old HN comment, do you think adding something like RenderEvent with values like client size already defined would make sense?

Servo feels like a great platform for experiments like this which, if successful, could move into Firefox.

euyyn · on Aug 24, 2017

I wish he went into more detail explaining his solution to the synchronous queries, as the linked gist implies much more knowledge of Servo's internals than I have.

andy_ppp · on Aug 25, 2017

I wish browser vendors would make stand at this point and say in 1 year all major browser engines will stop supporting document.write - it's horrible for developers to use and no-one I know advocates it - but I'm fairly sure ad networks still use it for no reason whatsoever.

The implementation of browsers is already difficult enough! It's great to see the lengths Servo is going to to be compatible, but on this one I say warn developers and deprecate this feature that no-one in 2017 should ever use.

jgraham · on Aug 25, 2017

Unfortunately the reality is that a double digit percentage of all sites rely on document.write [1]. Whilst it's true that a lot of usage is advertising, it's also used for functionality in sites like Google Docs. Perhaps with a lot of work you could get the usage down to single-digit percentage figures, but disabling it entirely is going to break an awful lot of sites that are unmaintained. For something like NPAPI, where there is a real ongoing cost of support, breaking old content — with a multi year timeline for migration — might be worthwhile, but I think it's hard to make the case that document.write is in the same category. Yes, it's an old and terrible API that causes layering violations, makes writing scripted parsers much harder, and ideally authors would avoid it. But making it go away would be a huge project with relatively low payoff compared to other possible uses of the same effort.

[1] https://discuss.httparchive.org/t/how-and-where-is-document-...

acdha · on Aug 25, 2017

In case anyone else was curious about precisely what those numbers measured, I just confirmed that Lighthouse only reports the no-document-write violation when the page actually calls it so those numbers aren't high due to the sites which still have fallbacks left from the paleolithic-era Web (I'd seen that with Adobe Analytics/Omniture where the tracking code injection had a fallback path for Netscape 2/IE3 until very recently).

andy_ppp · on Aug 25, 2017

Okay sure, but I’m pretty certain about 50% of those site have ads on them. Getting rid of Flash was a much bigger issue but Apple forced it through with iOS and it immeasurably improved the experience (at least for those platformsand largely for everyone). Removing document.write improves page performance as well as browser engines. Everything document.write involves can and should be done in other ways, putting document.write behind a Site X wants to allow legacy Javascript do you wish to continue? Clicking no and the ads don’t load would not be a terrible situation.

Maybe there are other APIs that should be deprecated first but I cannot think of any.

Finally I am pretty certain you could write a document.write polyfill, I’ll have a crack at some point I think.

rossy · on Aug 25, 2017

> it's horrible for developers to use and no-one I know advocates it

I've seen developers advocate for document.write to inject locally-hosted JavaScript as a fallback for when CDNs are down: https://www.hanselman.com/blog/CDNsFailButYourScriptsDontHav...

I definitely don't agree with this method, but there you go. Apparently ASP.NET 4.5 will even insert this pattern automatically.

CrystalGamma · on Aug 25, 2017

You could still insert <script> elements, you just wouldn't do it with document.write().

pygy_ · on Aug 25, 2017

I may be wrong, but I think that document.write() will load the script synchronously while inserting a `script` element will not.

acdha · on Aug 25, 2017

You can set async=false on a dynamic script element to avoid the default asynchronous behaviour.

pygy_ · on Aug 25, 2017

I just looked at the support matrix: IE10+. So it is reasonably supported nowadays, nice :-)

acdha · on Aug 25, 2017

Working on the web is refreshing like that these days — so many things which used to be painful are now just standards + “Microsoft doesn't support that old version of IE and neither do we”.

sergeykish · on Aug 25, 2017

If only someone introduced document format which disables document.write like https://www.w3.org/MarkUp/2004/xhtml-faq#docwrite ... one can only dream how wonderful life would be fifteen years later

amelius · on Aug 25, 2017

> The implementation of browsers is already difficult enough!

Clearly. I would go further and say that we should move all parsing and rendering into the browser's "user space". That way, browser vendors can concentrate on a simple set of primitives, and improve security, robustness, and compatibility, while users get more freedom and flexibility.

andy_ppp · on Aug 25, 2017

A rendering engine written in Javascript? Actually more possible with WebAssembly I suppose, but I’d worry about user space browser leaking information.

pas · on Aug 25, 2017

https://www.smashingmagazine.com/2016/03/houdini-maybe-the-m...

though I don't know what has became of that project

gsnedders · on Aug 26, 2017

It's happening, slowly. CSS Typed OM should be shipping soonish (it essentially provides a new object-model around CSS which doesn't just treat everything as opaque strings like CSSOM does), the other parts are still fairly early-stage work.

Touche · on Aug 25, 2017

Is it possible to use Servo yet? Last time I tried to use it, it was unusably slow on all websites I tested with. It was always weird to read about how fast it was when my (albeit limited) experience was the opposite. Is browser.html still the preferred way to test?

dbaupp · on Aug 25, 2017

As I understand it, a lot of pieces critical for real world use are really just minimum viable implementations to allow Servo to work end-to-end and feed data into the interesting parts. In particular, things like the network stack don't benefit nearly as much from aggressive parallelism and so little effort has been focused on them, not even bringing them up to level with the techniques other browsers use.

Touche · on Aug 25, 2017

Thanks, I can understand how that might be the case (that many parts are MVIs, as you said). My issues were never with things like networking, it was more that scrolling was really bad, and stuff rendered slowly.

pcwalton · on Aug 25, 2017

WebRender has matured a lot lately; it's even in Firefox nightlies behind an about:config preference. That said, there are still rough edges. If you see rendering slowness, feel free to file issues on http://github.com/servo/webrender.

There are no silver bullets for performance, only lots of lead bullets. I'm confident that parallelism via GPU and multicore is a large architectural win. But lots of making the Web fast involves a ton of little special case optimizations that just need to be implemented. That's how Stylo went: promising benchmarks and a mediocre user experience at the research stage turned into great benchmarks and a great user experience with production engineering work.