> What happens if the application crashes immediately after removing the work item but before it can do anything else? Doesn't this break exactly-once semantics... ?
If the connection is broken the transaction would be aborted and the lock released. If the worker hit an infinite loop or something like that you'd use something like `idle_in_transaction_session_timeout` to set a transaction timeout and/or have a worker monitoring system in place to kill long running jobs.
The important thing to note is that a single DB instance and a client still comprises a distributed system. You still have almost all the same problems with 'exactly-once' semantics you would with a distributed queue. You should make all processing jobs idempotent and support retry regardless of the tech backing the queue if you want a system that provides effectively exactly-once semantics.
ah! you're assuming the work is performed inside the same transaction as the dequeue operation, and locks held for the duration ?
If so...
While I suppose row level locking technically solves contention, it still feels like we're "asking for trouble" in holding databases locks while clients perform arbitrarily long work operations. There's also practical issues when the work itself is distributed and the original client can't itself keep state around, i.e. it has to end the low level transaction.
Hence my poor-man's question/proposal using worker IDs and timeouts...
> ah! you're assuming the work is performed inside the same transaction as the dequeue operation, and locks held for the duration ?
Yes that is the model the linked post is proposing, see the Example.
> While I suppose row level locking technically solves contention, it still feels like we're "asking for trouble" in holding databases locks while clients perform arbitrarily long work operations. There's also practical issues when the work itself is distributed and the original client can't itself keep state around, i.e. it has to end the low level transaction.
Not that I recommend using PG as a queue, but you have most/all those problems with any queuing backend. A problem you may have that is PG specific is that the # of open connections/transaction could become quite large with a lot of workers and PG doesn't play well with a lot of connections, it uses a process-per-connection model.
If the connection is broken the transaction would be aborted and the lock released. If the worker hit an infinite loop or something like that you'd use something like `idle_in_transaction_session_timeout` to set a transaction timeout and/or have a worker monitoring system in place to kill long running jobs.
The important thing to note is that a single DB instance and a client still comprises a distributed system. You still have almost all the same problems with 'exactly-once' semantics you would with a distributed queue. You should make all processing jobs idempotent and support retry regardless of the tech backing the queue if you want a system that provides effectively exactly-once semantics.