Not just Postgres. You can do exactly this with MySQL and SQL server too because...

smeagull · on Oct 5, 2022

We found difficulty with the purely advisory locking that Linux had. Also the possibility of network filesystems made it a pain.

Do you have any experience with either?

andrewstuart · on Oct 5, 2022

You don't lock, you move the file that it next to be processed. File moves are atomic. You move the file out of the list of files that are being picked up for processing.

Lock free.

andrewstuart · on Oct 5, 2022

Network file systems do not support atomic moves, but you should not run such an application on a network file system.

remram · on Oct 5, 2022

I also found that advisory locking has a lot of gotchas, especially when used in multithread contexts (apparently you can lose the lock because a different thread closed a different file descriptor on the same file).

hbrn · on Oct 4, 2022

What makes it atomic is running publishers and consumers on the same box (since you're sharing filesystem between those).

Also listdir is a big bottleneck here:

    while True:
        # get files in outbox
        files_in_outbox = [f'{PREFIX}/outbox/{x}' for x in os.listdir(f'{PREFIX}/outbox')]

andrewstuart · on Oct 5, 2022

>> What makes it atomic is running publishers and consumers on the same box (since you're sharing filesystem between those).

It's the move/rename that is atomic.

https://man7.org/linux/man-pages/man2/rename.2.html

hbrn · on Oct 5, 2022

Not really.

       However, there will
       probably be a window in which both oldpath and newpath refer to
       the file being renamed.

But that's not even the main point.

1. Move happens after email is sent, so there is a window where email is already being sent but file still exists. 2. Even if you do it before, there's still a window between os.listdir() and os.remove() 3. Complexity is O(N^2) due to listdir() + getctime() being called on every iteration.

If you just want to ensure order, it probably works fine at a small scale. But it would be unwise to run multiple consumers on a single instance, and impossible to run them on multiple instances.

dekhn · on Oct 5, 2022

I worked with a network filesystem that supported atomic renames and we based an entire large-scale production system on the idea that it would work (it did). The system supported Youtube and Google Play model training, regularly processing increments of hundreds of terabytes.

whalesalad · on Oct 5, 2022

love to see this kind of hackery

motogpjimbo · on Oct 5, 2022

I love to see effective yet brutally-simple "redneck engineering" solutions to software problems, particularly ones that straddle the line between genius and stupidity in a way that makes architecture astronauts feel uncomfortable.

I used to work at an brokerage that worked with a panel of around 12 providers, all of whom offered 5+ products that were updated multiple times per year to meet changing regulatory requirements. When a sales adviser recommended product X from provider Y to a customer, the adviser would then need to fill in an application form that was precisely tailored to that particular revision of that particular product. Bear in mind that these were complex, multi-sectioned forms. Needless to say, this created a huge workload for the devteam to keep track of all the product changes and update the UI accordingly each time.

At some point, someone on the devteam had the genius idea to simply take the PDF application forms from the provider, extract the individual pages as PNGs and overlay HTML form elements on top of them. The provider would essentially be doing our UI design for us. Add in an editor tool so the sales managers could set up the forms themselves and a tagging system so specific fields could be pre-filled from customer data we already had stored in the DB and the devteam's workload dropped by maybe 90%. Simple, stupid perhaps, but effective.