Hacker News new | past | comments | ask | show | jobs | submit login

Not just Postgres.

You can do exactly this with MySQL and SQL server too because they both support SKIP LOCKED.

Interestingly, the plain old file system on Linux also makes the basis of a perfectly acceptable message queue for many use cases - the thing that makes it work is that the file move operation is atomic. Atomic moves are what make queuing systems possible.

You could write a file system based message queue in 100 lines of async python, which I did here:

https://github.com/bootrino/arniesmtpbufferserver

File system based message queues can be written in any language, extremely simple and, most importantly - zero configuration. One of the most frustrating things about queuing systems is configuration - that includes database backed queuing systems. They can also be fast - I wrote one in Rust which maxed out the hard disk's random write capability well before maxing out the CPU - from memory it beat most of the common queuing systems in terms of messages per second.

Not all use cases for queues need to be able to globally distributed messages queues with the sort of guarantees needed for financial transaction processing. I would suggest to you that in fact most queues out there are used as outbound SMTP queues, which are then over engineered to use something like Celery, which is a nightmare to configure and debug.




We found difficulty with the purely advisory locking that Linux had. Also the possibility of network filesystems made it a pain.

Do you have any experience with either?


You don't lock, you move the file that it next to be processed. File moves are atomic. You move the file out of the list of files that are being picked up for processing.

Lock free.


Network file systems do not support atomic moves, but you should not run such an application on a network file system.


I also found that advisory locking has a lot of gotchas, especially when used in multithread contexts (apparently you can lose the lock because a different thread closed a different file descriptor on the same file).


What makes it atomic is running publishers and consumers on the same box (since you're sharing filesystem between those).

Also listdir is a big bottleneck here:

    while True:
        # get files in outbox
        files_in_outbox = [f'{PREFIX}/outbox/{x}' for x in os.listdir(f'{PREFIX}/outbox')]


>> What makes it atomic is running publishers and consumers on the same box (since you're sharing filesystem between those).

It's the move/rename that is atomic.

https://man7.org/linux/man-pages/man2/rename.2.html


Not really.

       However, there will
       probably be a window in which both oldpath and newpath refer to
       the file being renamed.
But that's not even the main point.

1. Move happens after email is sent, so there is a window where email is already being sent but file still exists. 2. Even if you do it before, there's still a window between os.listdir() and os.remove() 3. Complexity is O(N^2) due to listdir() + getctime() being called on every iteration.

If you just want to ensure order, it probably works fine at a small scale. But it would be unwise to run multiple consumers on a single instance, and impossible to run them on multiple instances.


I worked with a network filesystem that supported atomic renames and we based an entire large-scale production system on the idea that it would work (it did). The system supported Youtube and Google Play model training, regularly processing increments of hundreds of terabytes.


love to see this kind of hackery


I love to see effective yet brutally-simple "redneck engineering" solutions to software problems, particularly ones that straddle the line between genius and stupidity in a way that makes architecture astronauts feel uncomfortable.

I used to work at an brokerage that worked with a panel of around 12 providers, all of whom offered 5+ products that were updated multiple times per year to meet changing regulatory requirements. When a sales adviser recommended product X from provider Y to a customer, the adviser would then need to fill in an application form that was precisely tailored to that particular revision of that particular product. Bear in mind that these were complex, multi-sectioned forms. Needless to say, this created a huge workload for the devteam to keep track of all the product changes and update the UI accordingly each time.

At some point, someone on the devteam had the genius idea to simply take the PDF application forms from the provider, extract the individual pages as PNGs and overlay HTML form elements on top of them. The provider would essentially be doing our UI design for us. Add in an editor tool so the sales managers could set up the forms themselves and a tagging system so specific fields could be pre-filled from customer data we already had stored in the DB and the devteam's workload dropped by maybe 90%. Simple, stupid perhaps, but effective.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: