Hacker News

Patryk27 · on Aug 21, 2024

Not sure why all the hostility here - you haven't seen the code, know nothing about the domain and yet you're certain that our performance requirements are false and that "code base got sacrificed" (apparently by adding two lines of rather self-explanatory code?)

Feels like you've just read grugbrain.dev and decided to shoot your golden tips at everybody without actually trying to understand the situation.

Anyway, there's one good point here:

> why is your allocator in this path, then?

Because those prices change 24/7/365, million times a day, and so refreshing happens pretty much all the time in the background, eating CPU time. What's more, calculating prices is much more complicated than a hashmap lookup - hotels can have dynamic number of discounts, taxes etc., and they can't all be precomputed (too many combinations).

You know, not all complexity is made up, a little trust in others won't hurt.

Brentward · on Aug 21, 2024

I agree that the comment sounded pretty hostile, but I also agreed with the assessment that it might be better to avoid allocation in general. I know the code in the post is highly simplified, but you aren't exactly fully using "lazy iterators," at least in the post. refresh/load_hotels is still allocating new Hotels, each of which contains a one-million-element Vec. If you could reuse the existing self.hotels[id].prices Vec allocations, that might help, again at least in the example code.

On second glance, I guess this is what you're getting at with the ArcSwap comment in the post. It sounds like you really do want to reallocate the whole State and atomically swap it with the previous state, which would make reusing the prices Vec impossible, at least without keeping around two States and swapping between them.

Anyway, tone aside, I still think the comment had validity in a general optimization sense, even if not in this specific case.

Patryk27 · on Aug 21, 2024

Yeah, the example is not representative - in reality there's a couple of smaller vectors (base prices, meal prices, supplements, discounts, taxes, cancellation policies, ...) + lots of vectors containing indices (`for 2018-01-01, match prices #10, #20, #21`, `for 2018-01-02, match prices #11, #30`, ...), and they can't be actually updated in-place, because that would require using `RwLock`, preventing the engine from answering queries during refreshing.

(which is actually how it used to work a couple of years ago - ditching `RwLock` for `ArcSwap` and making the entire engine atomic was one of the coolest changes I've implemented here, probably worth its own post)

scottlamb · on Aug 21, 2024

Makes perfect sense to me for the updates to happen atomically and avoid causing lock contention, even if that makes the loader more allocation-happy than it'd be otherwise. I've done similar things before.

What about the query path? Your post talked about 10% improvement in response latency by changing memory allocators. That could be due to something like one allocator making huge page use more possible and thus vastly decreasing TLB pressure...but it also could be due to heavy malloc/free cycles during the query getting sped up. Is that happening, and if so why are those allocations necessary? Ignoring the tone, I think this is more what akira2501 was getting at. My inclination would be to explore using per-request arenas.

Patryk27 · on Aug 22, 2024

Per-request arenas are a nice idea, yeah - we haven't pursued the optimizations more, because we're satisfied with the current speed, but certainly there are some things that could be improved.

urbandw311er · on Aug 21, 2024

Out of interest, is there no way to rearchitect the whole thing to be event-based, ie more like a producer-consumer situation? Or do you have to loop back through all the source data and poll every hotel to fetch its current prices?

Patryk27 · on Aug 21, 2024

Sure, it is producer-consumer - we use Postgres' LISTEN/NOTIFY mechanism (mostly because we have no other use cases for queueing, so "exploiting" an already existing feature in Pg was easier).

The example in article shows all hotels getting refreshed, but that's just because that's the quickest way to reproduce the problem locally. In reality we refresh/reindex only those hotels which have changed since the last refresh - over day(s) this accumulates and the OOM was actually happening not immediately on the first refresh, but after a couple of days (which is part of what made it difficult to catch).

akira2501 · on Aug 21, 2024

> Feels like you've just read grugbrain.dev and decided to shoot your golden tips at everybody without actually trying to understand the situation.

This feels like you've taken this personally or are projecting.

I'm not sure if you posted this to Hacker News as some sort of marketing exercise, but this is _Hacker_ News, there should be an expectation that some people are going to take a critical view at your post.

> Because those prices change 24/7/365, million times a day, and so refreshing happens pretty much all the time in the background, eating CPU time. What's more, calculating prices is much more complicated than a hashmap lookup - hotels can have dynamic number of discounts, taxes etc., and they can't all be precomputed (too many combinations).

I get that you have to recalculate things, I'm still not entirely sure how you ended up with 10% of your overhead being in malloc while doing it. That's pretty unusual, and almost everywhere, would be considered a code smell.

> a little trust in others won't hurt.

That's not why I'm here and I'm very sure that's not why you've posted this here either.