I had a GCP Cloud Run function that rendered videos. It was fine for one video p...

seer · on Dec 23, 2023

Hah welcome to cloudrun! I was evaluating it a few years ago to host some internal corporate app.

It worked great and was way easier to deploy than k8s setups. However after some testing we found out that the core logic of the app - a long running process, would just crawl to a halt after some time.

It turned out google wanted to push you to use their (paid) queue / pubsub solutions, but they didn’t want it to be _too_ obvious, so cloudrun would actually throttle its cpu sometime after the request it was spawned was returned.

Our logic was based on pushing stuff in a queue and having it be processed outside of a request, but google just f*ked with that solution.

And it would have been fine if that was upfront info, but it was buried in a doc page somewhere obscure, small print…

Thats the time I realized how bad gcp can be…

simonbarker87 · on Dec 23, 2023

Ah, that would make sense! It was always the second video being processed at about the same amount of time through it. Thanks

arccy · on Dec 23, 2023

that was the whole point of cloud run, which was obvious when you looked at the pricing for it: If an instance is not processing requests, the CPU is not allocated and you're not charged.

cr3ative · on Dec 22, 2023

This reminds me of a service I recently found that was routinely crashing out and being restarted automatically. I fixed the crash, but it turns out it had ALWAYS been crashing on a reliable schedule - and keeping the service alive longer created a plethora of other issues, memory leaks being just one of them.

That was a structural crash and I should not have addressed it.

rolisz · on Dec 22, 2023

How many memory leaks were discovered only during the winter code freeze, because there were no pushes being done, so no server restarts

calvinmorrison · on Dec 22, 2023

At Fastmail the ops team we ran fail overs all the time just to get our failures so reliable they worked no matter what. Only once in my tenure did a fail over fail and in that case there was a --yolo flag

jedberg · on Dec 23, 2023

At reddit we would randomly select a process to kill every 10 minutes out of the 10 or so on each machine, just so they would all get a restart in case we didn't do a deployment for a few days.

At Amazon they schedule service bounces during code freeze for any service that is known to have memory leaks because it's easier than finding the leak, which isn't usually an issue since it gets deployed so often.

yjftsjthsd-h · on Dec 23, 2023

And as a nice bonus you get chaos monkey for free:)

simonbarker87 · on Dec 22, 2023

Oooh, you’ve just reminded me of the email server at my first dev job. It would crash every few days and no one could work out why. In the end someone just wrote a cron job type thing to restart it it once a day, problem solved!

quickthrower2 · on Dec 22, 2023

What you call a hack everyone else calls devops :-). You have higher standards!