I did sth similar. Designed and built for 10 million objects / hour. Picked up by workers in batches of 1k. Benchmark peaked above 200 million objects / hour with PG in a small VM. Fast forward two years, the curse of success strikes, and we have a much higher load than designed for.
Redesigned to create batches on the fly and then `SELECT FOR UPDATE batch SKIP LOCKED LIMIT 1` instead of `SELECT FOR UPDATE object SKIP LOCKED LIMIT 1000`. And just like that, 1000x reduction in load. Postgres is awesome.
----
The application is for processing updates to objects. Using a dedicated task queue for this is guaranteed to be worse. The objects are picked straight from their tables, based on the values of a few columns. Using a task queue would require reading these tables anyway, but then writing them out to the queue, and then invalidating / dropping the queue should any of the objects' properties update. FOR UPDATE SKIP LOCKED allows simply reading from the table ... and that's it.
smart. although, i guess that pushes the locking from selecting queue entries to making sure that objects are placed into exactly 1 batch. curious if you ran into any bottlenecks there?
> ... making sure that objects are placed into exactly 1 batch. curious if you ran into any bottlenecks there?
A single application-layer thread doing batches of batch creation (heh). Not instant, but fast enough. I did have to add 'batchmaker is done' onto the 'no batch left' condition for worker exit.
> ... that pushes the locking from selecting queue entries to ...
To selecting batches. A batch is immutable once created. If work has to be restarted to handle new/updated objects, all batches are wiped and the batchmaker (and workers, anyway) start over.
Redesigned to create batches on the fly and then `SELECT FOR UPDATE batch SKIP LOCKED LIMIT 1` instead of `SELECT FOR UPDATE object SKIP LOCKED LIMIT 1000`. And just like that, 1000x reduction in load. Postgres is awesome.
----
The application is for processing updates to objects. Using a dedicated task queue for this is guaranteed to be worse. The objects are picked straight from their tables, based on the values of a few columns. Using a task queue would require reading these tables anyway, but then writing them out to the queue, and then invalidating / dropping the queue should any of the objects' properties update. FOR UPDATE SKIP LOCKED allows simply reading from the table ... and that's it.