more reval's comments

reval · 2025-02-21T12:46:41 1740142001

I once had the same fear but when I look at my education I realize that coding is not what forced me to learn lower level concepts. I learned how a computer works through deliberate study of lower level concepts and those learnings translated to a better understanding of code. The kids will be alright as long as study remains a common practice.

reval · 2025-02-19T00:43:32 1739925812

Why is that an anti-pattern? Databases have added `SKIP LOCKED` and `SELECT FOR UPDATE` to handle these use cases. What are the downsides?

maniacalhack0r · 2025-02-19T02:25:45 1739931945

as with everything, it depends on how you're processing the queue.

eg we built a system at my last company to process 150 million objects / hour, and we modeled this using a postgres-backed queue with multiple processes pulling from the queue.

we observed that, whenever there were a lot of locked rows (ie lots of work being done), Postgres would correctly SKIP these rows, but having to iterate over and skip that many locked rows did have a noticeable impact on CPU utilization.

we worked around this by partitioning the queue, indexing on partition, and assigning each worker process a partition to pull from upon startup. this reduced the # of locked rows that postgres would have to skip over because our queries would contain a `WHERE partition=X` clause.

i had some great graphs on how long `SELECT FOR UPDATE ... SKIP LOCKED` takes as the number of locked rows in the queue increases, and how this partiton work around reduced the time to execute the SKIP LOCKED query, but unfortunately they are in the hands of my previous employer :(

izacus · 2025-02-19T10:43:08 1739961788

How did you get from original post of "low level of load" to overengineering for "150 million objects/hr".

Is the concept of having different solutions for different scales not familiar to you?

pritambaral · 2025-02-19T04:13:12 1739938392

I did sth similar. Designed and built for 10 million objects / hour. Picked up by workers in batches of 1k. Benchmark peaked above 200 million objects / hour with PG in a small VM. Fast forward two years, the curse of success strikes, and we have a much higher load than designed for.

Redesigned to create batches on the fly and then `SELECT FOR UPDATE batch SKIP LOCKED LIMIT 1` instead of `SELECT FOR UPDATE object SKIP LOCKED LIMIT 1000`. And just like that, 1000x reduction in load. Postgres is awesome.

----

The application is for processing updates to objects. Using a dedicated task queue for this is guaranteed to be worse. The objects are picked straight from their tables, based on the values of a few columns. Using a task queue would require reading these tables anyway, but then writing them out to the queue, and then invalidating / dropping the queue should any of the objects' properties update. FOR UPDATE SKIP LOCKED allows simply reading from the table ... and that's it.

maniacalhack0r · 2025-02-19T05:28:53 1739942933

smart. although, i guess that pushes the locking from selecting queue entries to making sure that objects are placed into exactly 1 batch. curious if you ran into any bottlenecks there?

pritambaral · 2025-02-21T13:27:32 1740144452

> ... making sure that objects are placed into exactly 1 batch. curious if you ran into any bottlenecks there?

A single application-layer thread doing batches of batch creation (heh). Not instant, but fast enough. I did have to add 'batchmaker is done' onto the 'no batch left' condition for worker exit.

> ... that pushes the locking from selecting queue entries to ...

To selecting batches. A batch is immutable once created. If work has to be restarted to handle new/updated objects, all batches are wiped and the batchmaker (and workers, anyway) start over.

ninju · 2025-02-19T03:08:46 1739934526

I believe the article and parent comment were discussing queue solutions for low-volume situations.

maniacalhack0r · 2025-02-19T04:45:18 1739940318

completely missed this. apologies.

stickfigure · 2025-02-19T04:01:17 1739937677

40,000 per second is waaaaay beyond where you should use a dedicated queuing solution. Even dedicated queues require tuning to handle that kind of throughput.

(or you can just use SQS or google cloud tasks, which work out of the box)

pritambaral · 2025-02-19T04:21:03 1739938863

I hit 60k per second in 2020 on a 2-core, 100GB SSD installation of PG on GCP. And "tuning" PG is way easier than any dedicated queueing system I've seen. Does there exist a dedicated queueing system with an equivalent to EXPLAIN (ANALYZE)?

stickfigure · 2025-02-19T04:49:32 1739940572

If that's true, you managed to do much better than these folks:

https://softwaremill.com/mqperf/

Maybe you should write a letter?

winrid · 2025-02-19T05:44:02 1739943842

It's possible the person you're replying to wasn't using replication, so it's entirely different. Those folks also used "synchronous_commit is set to remote_write" which will have a performance impact

pritambaral · 2025-02-21T13:30:19 1740144619

This is correct. My use-case was safe with eventual consistency, so I could've even used `synchronous_commit=off`, but I kept it to 'local' to get a baseline. Was happy with the 60k number I got, so there was no need for 'off'.

But I think the biggest reason I hit that number so easily was the ridiculous ease of batching. Starting with a query to select one task at a time, "converting" it select multiple tasks instead is ... a change of a single integer literal. FOR UPDATE SKIP LOCKED works the same regardless of whether your LIMIT is 1 or 1000.

kstrauser · 2025-02-19T07:45:50 1739951150

I worked at a shop that had to process about 6M RPS for 5 seconds at a time, once a minute or so. That looked a lot like a boatload of Python background threads queueing work in memory then flushing them out into Cassandra. That was a fun little project.

chupasaurus · 2025-02-19T03:15:02 1739934902

> 150 million objects / hour

Is not a low volume unless this could be done in batches of hundreds.

maniacalhack0r · 2025-02-19T04:45:25 1739940325

completely missed this. apologies.

reval · 2025-02-18T14:30:07 1739889007

I wish this article spent more on the identification of personality conflicts. It's not clear from reading how a personality conflict differs from an idea conflict.

One highlight from the article is the culture of experimentation. Experimentation is often a requirement resolving conflict on technical teams since facts don't lie.

reval · 2025-02-18T01:24:26 1739841866

This looks like a great product that could really help smaller orgs get through SOC2 while leveraging engineering to automate as much as possible. I base that statement on the extensibility offered here.

Are there any orgs that are self-hosting that can offer testimonials?

AnBouch · 2025-02-18T08:09:13 1739866153

Hey, Probo founder here.

We are helping several companies achieve compliance using our MVP and are building the open-source software in parallel. You can check our avancement here: https://github.com/getprobo/probo

As it is under construction, no testimonial yet regarding the self-hosted part.

But happy to help you if we can, feel free to reach out.

reval · 2025-02-19T01:11:59 1739927519

If I’m understanding correctly, the MVP and OSS are separate. If correct then I have to ask if the OSS version is in a usable state? Regardless, I’ll keep an eye on this project. Kudos.

reval · 2024-11-26T17:59:11 1732643951

This is technically correct but disingenuous. This reminds of the climate change comic where a scientist asks “What if climate change is a big hoax and we create a better world god nothing?”

reval · 2024-10-27T11:45:13 1730029513

Does anyone have experience with this? How does it compare to say Redshift?

ronnykylin · 2024-10-28T06:54:50 1730098490

Hi, disclosure of interests: I am a Developer Advocate for Apache Doris.

If you'd like to learn more use cases about Apache Doris, you may read our user stories (https://doris.apache.org/blog?currentPage=1&currentCategory=...) or join our community on Slack (https://join.slack.com/t/apachedoriscommunity/shared_invite/...)

And this is a functional comparison between Apache Doris and Redshift: https://drive.google.com/file/d/1brUjcu_Lgisy4oa-g9hNvktz0ca...

reval · on Aug 28, 2022

Can you give an example of a non-programming task where errors can be serious/fatal?

In my experience, the programmers are closest to the technical controls in the organization. In matters of compliance and security, the programmers will often play a key role in documenting existing practices and suggesting ways of meeting requirements.

Maybe you had something else in mind.

duped · on Aug 29, 2022

Designing industrial equipment that needs human operators

amichail · on Aug 28, 2022

For example, do programmers avoid driving?

reval · on Aug 29, 2022

The closest I’ve ever seen are programmers that avoid data collection and surveillance. Like avoiding social media and using an ad-blocker. Not exactly mortal risk aversion.