Shared Memory Versioning to improve slow interactions

scottlamb · on June 3, 2024

> This is why we’ve been focused on identifying and fixing slow interactions from Chrome users’ field data, which is the authoritative source when it comes to real user experiences. We gather this field data by recording anonymized Perfetto traces on Chrome Canary, and report them using a privacy-preserving filter.

This is a great example of the positive value to the user of this kind of telemetry. Of course this doesn't mean people aren't also right to fear it's used in ways that aren't in their best interest. No clear moral here, just a data point.

csande17 · on June 3, 2024

Couldn't this particular issue have been discovered just as easily by visiting a bunch of websites in a lab, rather than collecting website performance data from end users?

scottlamb · on June 3, 2024

Sure, but there's always the question of how realistic your test is in terms of websites you choose, the way you use them, your hardware setup, etc. I think their "Chrome users’ field data ... is the authoritative source when it comes to real user experiences" is totally valid.

eviks · on June 4, 2024

There are always questions, like whether your telemetry is measuring things correctly or not introducing more slowdowns, so it's not that authoritative

So this isn't a great example because you can indeed see that many requests are redundant and are frequent by visiting a few sites: these facts don't depend on any hardware etc.

orf · on June 4, 2024

They might well depend on available hardware, given the specific nature of the problem (multiple processes and the level of concurrency within).

Ultimately the source of truth is actual end-user performance. If you’re not measuring that directly, then you’re hoping what you’re doing is a proxy for end-user performance.

Sometimes a suitable proxy is available. Often, it’s not.

scottlamb · on June 4, 2024

I strongly disagree. If you just "visit a few sites", you'll almost certainly find something you can optimize, but you have no way of knowing how significant it is in terms of how many sites/user/visits it affects, how its performance impact compares to say decreasing CPU cache pressure, etc. Telemetry can answer this stuff, and it can be both lightweight and accurate. I'm a big fan of looking at actual performance data (though I've mostly worked on the server side where gathering it isn't as controversial).

The goal isn't (shouldn't be) to just find a bunch of things to optimize but instead to achieve the best balance between complexity and the experience for the vast majority of users given limited engineering resources.

eviks · on June 5, 2024

Why wound you visit a "few sites" when you can visit thousands?

> you have no way of knowing how significant it is in terms of how many sites/user/visits it affects

The way of knowing is getting site visiting statistics

> I'm a big fan of looking at actual performance data

And you'd be looking at exactly that - actual performance data on representative hardware on representative websites, but with more flexibility and more data available given the fact both parts are yours and you can collect anything without any privacy concerns

scottlamb · on June 8, 2024

> Why wound you visit a "few sites" when you can visit thousands?

A "few sites" was a quote from your earlier comment. [1] You're moving the bar. Perhaps you're seeing it isn't quite as easy as you first thought to do a good job with this.

[1] https://news.ycombinator.com/item?id=40570930

camtarn · on June 3, 2024

The article doesn't mention why versioning was used instead of just storing the cookie string in shared memory. Something about atomicity, perhaps? i.e. you can't guarantee that a string is updated atomically without synchronisation, so synchronisation would be needed, which could cause blocking or cause requests to queue; whereas a simple version number can be written and read atomically?

gary_0 · on June 5, 2024

Probably security, too: cookie data is security-critical, since you can use it for session hijacking, etc. The version doesn't leak anything about the cookie's contents so it can be directly accessed without any security checks. The whole point of isolating the process running each website from other browser resources (like cookies) is an added layer of security.

saagarjha · on June 3, 2024

Maybe because cookies can be of arbitrary size and you don’t want to keep them around all the time?

liuliu · on June 3, 2024

This is light on details. What's the typical way to propagate version from shared memory to each renderer process so they know it is up-to-date / stale? Wouldn't a naive implementation has thundering herd problem?

kg · on June 3, 2024

Reading a version number from shared memory is basically free, and you already had the "thundering herd" before this fix, the herd was just issuing tons of RPCs to get the latest cookie, and now they will be checking the version in shm and then early-out skipping the RPC operation because the cookie hasn't changed.

OlivierLi · on June 4, 2024

Yep that’s pretty much it. There’s no pushing of the version from the privileged process to the renderer, only pulling. So there’s not many renderers unblocking at the same time to create any kind of herd.

xuhu · on June 3, 2024

They possibly store just the version number for each cookie in shared memory, whereas storing complete cookie data in shared memory would need more synchronization.

OlivierLi · on June 4, 2024

Hey, one of the authors here. That’s pretty much it. That said I’m currently working in the next iteration of this which would indeed share more. That said for now it’s not super trivial because it needs a cross-platform condition variable like abstraction that works across shared memory. The pthread based one is not that bad and I’m hacking on it.

tedd4u · on June 3, 2024

They don't quite spell it out fully but it seems like a good guess that they use a datatype (like a 64-bit word) for the version number that can be read and written atomically, so no synchronization is needed at all and it can be very fast. So it's only if you don't have the latest version that you do a slower fetch of the updated value (that you can cache) with a mutex to make sure you don't get a partial update.

OlivierLi · on June 4, 2024

Right now it’s falling back on a mojo IPC. Next step is indeed shared memory mutex :)

kg · on June 3, 2024

You could come up with a scheme to store each version of the cookie in an accessible part of the SHM at a fixed address, but then you have to do GC to evict the oldest versions over time without causing problems etc. So I'd bet they are still retrieving the actual cookie contents via an RPC like before.

simscitizen · on June 3, 2024

Not sure there is much of a herd, because document.cookie is only shared amongst same-site renderer processes.

chaboud · on June 3, 2024

Yeah. I was looking for something a little more meaty, but there are solid ways to attack this.

I like to use copy-on-write reference immutable tree storage with self-recycling/self-freeing reference counting shared memory objects. Entities can retain prior versions for as long as necessary and then pop over to new versions when they’re ready, wait free. For ultimate cache and TLB performance, some other approaches (including deliberate copies) can be useful, but I’ve found the approach to be very friendly to multi-thread and multi-process scale for a long time.

OlivierLi · on June 4, 2024

This is interesting stuff. I’m one of the author of the change and working on the next iteration. If you have an example of what you describe I’d love to have a look!

tedd4u · on June 3, 2024

"we … determined that it … improved the slowest interactions by approximately 5% … result[ing] in more websites passing Core Web Vitals"

Thanks, Google.

JackYoustra · on June 3, 2024

not even just that! "Combined with other changes" like ???

Groxx · on June 3, 2024

>We were astonished to discover that 87% of cookie accesses were redundant...

er. really? it has no "it has been updated" event, polling it even when unnecessary is kinda the only option. honestly I'm surprised it's only 87%, I would've bet it was much closer to 99%. tbh it makes me wonder if there's some incorrect caching going on in some popular frameworks.

>... and that, in some cases, this could happen hundreds of times per second.

ignoring intentional stuff like cross-tab communication via cookies (there are much better options nowadays, but not all code has switched), yeah - also not all that surprising, but definitely one of those "... but why?" discoveries.

rob74 · on June 3, 2024

... and now that they made it faster, the authors of popular frameworks can get away with even worse mistakes. Hooray for progress!

kg · on June 3, 2024

Between "this framework was requesting cookie data too often" and "this framework was incorrectly caching cookies and causing weird data inconsistency errors that frustrate users" the former is probably better, and they fixed that. The latter is something that isn't straightforward to fix without introducing the former issue.

Groxx · on June 3, 2024

100% agreed, I would much rather have things just hammer the browser and be trivially correct. Far easier to debug and layer in a cache if needed when you know your specific case, rather than trying to debug a flawed cache.