I've been met with looks of disgust for using a filesystem to implement a queue, but I feel it's unjustified. A modern unix filesystem is surprisingly well suited to this task: You get atomicity "for free", inotify allows it to be interrupt driven rather than polled, it inherently supports multiple processes (thus different parts of the system can be implemented in different languages), there's no need for locking as long as you implement the queue using directories and 'mv', and it's extremely quick to implement, understand, and modify.
The only caveats are that of performance (with a traditional server I wouldn't worry about performance until you need to process hundreds of items per second, but on EC2 nodes that threshold is more near the range of dozens per second), and the need to regularly archive the "done" directory (cron solves this nicely).
I recently read through news.arc (the source to Hacker News itself) and was dumbfounded by how such a simple, file-system backed system was able to cleanly and performantly handle many of the use cases of a document store or key value store. Are there any good resources on the DOs and DONOTs of building apps in this "Hey.... dummy.. Just use the file system!" -style?
Watch out for the 32,000 subdirectory limit. If your job tickets are complex enough to be implemented as a directory instead of a file, you'll get bitten by this (the number of files in a directory is only limited by the number of inodes in the entire filesystem).
If you are really lucky, and your tickets only need to represent a single piece of data (some sort of ID for example), you can just use the name of the file itself for the data storage and deal only with empty files. Because this only uses a single inode/block, it represents the best case scenario for speed and scalability in terms of the number of tickets which can accumulate before you need to archive. But more likely, you are going to have to worry about ticket namespace collisions (unless you have some sort of "set" like requirement where each ID can only be in the queue once at a time) which means you are using something like mktemp to create the file and then storing the ID inside the file.
Another key is to make sure you create new jobs in a "staging" dir, and then mv them into the "in" dir. Otherwise you have a race condition between your queuing system and whatever creates the tickets.
Here's a basic layout: /stage, /in, /active, /done. Some process on your system creates a ticket (which could be a single file or a dir) in /stage and then moves it into /in. This wakes up your queue, which moves it to /active when it starts processing it, and then moves it to /done and moves on to the next ticket in /in.
Another nice thing this gives you is that recovering from a crash / unclean state amounts to running ls on /stage, /in, and /active.
One top tip from personal experience is to make the resulting structure reasonably straightforward to browse manually - having huge numbers of subdirectories is going to be a barrier to this.
I can not find a reference to explicit "DOs and DONOTs", but you can surely gather experience from systems that have used this schema for a looong long time: mail handling systems.
For a quick start, I would look at the maildir specification, that includes instructions on how you should read form and write to maildir folders to avoid locking and get good performance: http://www.qmail.org/man/man5/maildir.html
Then, I would dive deeper by looking at the processes used to maintain the mail queues in qmail: http://www.qmail.org/qmail-manual-html/misc/INTERNALS.html . Obviously, you could also look at how postfix or exim handle their own queues.
Anyway, gathering all the experience buried in those systems and summarizing it in a logical way would make a great great article...
One big DONOT is: Don't do this if you need more than one physical host to be processing the jobs at the same time. Resiliency is hard to get right with shared filesystems.
You know, I set something up in a very similar way a few years back for a client. It was a quick a dirty hack to get a processing queue up and running fast with low overhead on the server (a VM with no resources). The processing was to take a PDF that would appear in the directory and then email or fax it depending on the directory.
I felt dirty while doing it, but didn't want to build up a whole ActiveMQ (or similar) queue solution - it was just overkill.
6 years out that simple hack is still working today without needing any sort of maintenance.
I suspect there's a large overlap between the people who would ridicule such an approach and the very people who find themselves in need of this article :)
A while back I looked at moving part of the queue into mysql, but I got stuck while trying to keep it a polling based system (I should have been able to accomplish this by having a mysql trigger touch a file in the filesystem, which would trigger inotify / wake up the queue, but I couldn't get it to work as described in the docs). After reading the author's mention of postgresql having some sort of listen/notify feature, I'll have to give that a look.
I can't vouch for the performance characteristics, but it's got some nice features around how notification delivery interacts with transactions (notifications within an explicit transaction are not delivered until & unless the transaction commits successfully, order of notification from a single transaction is preserved), guaranteed delivery, and some degree of deduplication of identical notifications.
However... PostgreSQL's "SELECT FOR UPDATE" seems to have significantly better performance than MySQL's version, most likely due to how concurrency & MVCC vs. locking interact. A few years back at a now mostly-defunct social network which shall remain nameless I had to implement a cluster-wide work queue for sending out member emails that couldn't involve installing new software and had no shared disk space to use for that style. A queue based on an existing PostgreSQL installation (the PG process had a 3 year uptime at that point) using "SELECT FOR UPDATE WHERE worker_id IS NULL /LIMIT 1" followed by an immediate update of the worker_id and transaction end had quite good performance on mid-2000s hardware. As far as I could tell from my research then the limit 1 with no ordering clause locked only one row and concurrent processes each got a different one, so they didn't have to serialize on grabbing a job. Definitely do your own research and testing, but in my experience SELECT FOR UPDATE used carefully with a thorough reading of the docs is a much more viable solution on PostgreSQL than MySQL for a few hundred worker processes. I wouldn't try it for G+ or Twitter, but if you're dealing with more than the 50-100K daily active visitors and 25M or so customized emails that went out monthly I suspect you know you're going to be putting in some extra engineering time.
http://www.postgresql.org/docs/9.1/static/sql-select.html#SQ...
The only caveats are that of performance (with a traditional server I wouldn't worry about performance until you need to process hundreds of items per second, but on EC2 nodes that threshold is more near the range of dozens per second), and the need to regularly archive the "done" directory (cron solves this nicely).
...but why would you worry about these problems when other solutions like kestrel, beanstalk, and redis (my personal favorite) are equally easy to set up and understand?
And for that matter, how do you give multiple machines access to this workqueue?
Yeah, but why not just skip the intermediary step and use a "real" queueing system to begin with? It doesn't sound to me like it's any more effort in the short term or in the long term, and it's one less thing you have to worry about as you scale.
I think making files in a folder represents the least amount of effort for making a queue. So using one of the systems you described is necessarily more work.
I think the commenter outlined the reasons: any process can access the data with simple unix commands, and everyone understands files.
Plus files could be more efficient. What if the work unit you are processing are files? If the files are the work and the folder is the queue, you don't need any extra abstractions to access the data.
Because then you have to admin the real queueing system. If it's something simple, sometimes the one-time cost of re-solving the problem is less than the ongoing cost of dealing with that damn queueing system every time someone wants to set the app up on a new host, or a new dev wants to work with it, or it crashes, etc.
Fine line for when either approach is appropriate.
Beyond the novelty of doing it as a learning exercise the first time 'round, I agree that your approach is better where there's any expectation of scale.
Agreed++. I've done similar filesystem queues and have been told by corworkers "Yuck! That needs to be in a database!" ... so I ask why ... and the answer is "Because that's what databases are for!" Inevitably these cowrokers inherit the project, database every aspect of it, and then the app promptly collapses into a steady stream of downtime alerts. Yes, that's what databases are for: keeping DBA's gainfully employed.
Quick & dirty solutions like this often get dismissed out of hand but in practice something like this can be thrown together in a day but perform well enough to last until you know you've built something that merits a more robust implementation.
Unix's "everything is a file" philosophy can be stretched pretty damn far.
but i agree. files and folders are an elegant abstraction, that when combined with the unix toolset become extremely powerful.
The big shortcoming I see with this solution, and maybe this is what you are saying in the caveats, is that it doesn't support multiple worker boxes.
Of course you could use NFS, but this complicates it. Suddenly the consistency model is more complex and workers must partition work, and so on.. At that point, a mysql backed queue becomes an appealing and easy way to make a distributed queue.
My experience is that when something is filesystem based, you eventually have someone write a not-robust-enough bash script to do some maintenance operation (find|xargs|rm cleanup script, a sed based update script, etc) and it blows stuff up.
I think the transaction log and the forced structure of using SQL (barring some yutz carelessly using TRUNCATE) add some value managing the data, too. Not as big an issue where it's a single person maintaining the app.
I think, as he said, everyone shouldn't run out and replace a mysql job queue for their wordpress blog. In a great many cases it doesn't matter.
I also like how he never said "Don't use mysql as a queuing system" but "be careful of these things". I've used mysql as a queuing system, and it works fine. I looked at replacing it with a different database, but in that situation it was not worth the investment.
Signaling mysql + archiving performed work + no locks that lock more than the exact row that's being updated (and also avoiding concurrent workers acting on the same task) will take a mysql backed queuing system far. I've set up a system that processes well over 5,000 tasks / day using it.
Do I think everyone should use mysql as their queuing backend? No. People should probably use a queuing library, with persistence to a database (redis?) enabled for critical tasks. Of course, as the article said, be careful about the choice of backends.
That is about one task per 50 trillion CPU cycles, you would be hard pressed to write a queue implementation that is to slow for that. Numbered files in a single directory on a synchronously mounted filesystem designed for few large files that only allows linear directory scans might qualify, but I am not even sure about that.
This is why Redis / *SQL is my favored stack. It just covers so many bases, you get things like safe queuing, caching, pub/sub, and weird high-performance low-durability cases from Redis, and great, safe relational support from SQL.
For best results, it's good to have at least two redis servers, one with snapshotting as a cache (fast, less durable), one with 1 second Append only files (still fast, but slower) for data you care more about.
If your queues are getting that backed up, you're either facing bigger problems than your queueing system (most likely workers being down), or you're big enough to afford more machines and/or a custom solution (such as kestrel).
The parent post referred to using Redis as a multi-purpose store. If there is a bunch of other non-queue data in Redis and the setup is only expecting it to use 1GB or so, there's likely not a giant amount of room for queue entries left. While everyone is sleeping, some crashed workers combined with broken or poorly configured monitoring can fill up queues very quickly. Been there, done that. Either the OOM hits and there is some data loss or the swap hits and brings everything down.
Yeah, that's true. But my environment is such that any one of 100 or so app servers has a significantly lower chance of running out of memory than the Redis server does.
The 0MQ high water mark is set high enough so that it's virtually impossible not to fix a broken DB by the time messages on the client side create an OOM condition being queued in memory.
Ultimately, it's all about the odds. That's what HA, replication, and DR are all about. It's so statistically unlikely for certain things to happen, they just fall out of the realm of reason. Most operations folk I've talked to don't even consider their disaster recovery plans to be within the realm of feasibility. The chances of a catastrophic event rendering the owners of the system defunct is many orders of magnitude more likely than an event that breaks the standard data fail-safes that most datacenters have in place.
Unless there are so many items per sec that your persistence can't keep up. Wouldn't this kind of create the same situation: You cant accept all the new items and have to throw some away. Only, now everything is slower. A lot slower.
Ok, the first scenario is caused by the workers being too slow, so it's not exactly the same :)
Yes, persistent queues have issues, which is why 0MQ exists. But using a non persistent queue to deal with overflow just delays the problem, which was why I asked...
Queues in the DB are so common that in MSSQL they made it a first class feature: SQL Server Service Broker. Using it is an XML and T-SQL nightmare, but since it guarantees in-order, only once delivery, and supports routing and in-DB worker activation, you can build some really robust and powerful stuff with it.
> Instead of SELECT FOR UPDATE followed by UPDATE, just UPDATE with a LIMIT, and then see if any rows were affected
Should be noted, this is not necessarily a good solution: a concurrent consumer, which may be another incarnation of a given script running with a lag, may hijack the queue element locked this way; as a result you may end up having two or more incarnations of the consumer handling the same queue element.
The most universal approach to DB queues is to assign each consumer process a unique ID which it should use for locking queue elements in their UPDATE ... LIMIT 1.
I'm just going to count killing a SLEEP(100000) query as a means of signalling a worker as the something new I learned today. I'm not sure I've ever written anything where implementing that would have had any real impact, but it's filed away for the future.
The article didn't mention the main advantage of storing queues in DB - transactions. Say you need to update other records in DB while processing a job with 100% consistency. If it's all in the same DB you can update both job as well as data in a single transaction.
There are three possible ways that I know of to handle concurrency issues on shared data: Use some sort of a journaling approach (google keywords 'snapshots' or MVCC), or use locks, or ignore the problem.
If it ignores the problem, it's not a database.
Locks suck for volume. Locks cause much more deadlock than other options. Locks are fast in the simple case. Locks are easier to program and take less resources.
InnoDB uses locks.
SQL Server also defaults to locks. People often specify 'ignore the problem' mode (nolock/read uncommitted). There is a new journaling approach available, but it was only introduced in 2005 and I don't think many people are using it yet. Which is a shame, it's a great feature.
Oracle and Postgres both do a journaling approach. They will have less deadlock problems because readers and writers don't need to block each other. With InnoDB or default sql server, read locks block writes, which sucks. See http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.htm...
The title of the article is not very representative of its contents. It should be: "5 subtle ways you’re using MySQL as a queue, and how it COULD bite you IF you use poor schema design NOT optimized for YOUR workload".
MySQL queues work just fine with the recommendations Barron provides himself "1) avoid polling; 2) avoid locking 3) avoid mixing queue and archive tables".
Was broken on first load for me, Ubuntu 11.04 Chrome 13.0.782.220.
However when I reloaded the page it fixed itself. In fact as the page reloads I can see the text layout first breaking and then immediately fixing itself...
I've been in a situation where I've needed to queue about 100k of messages. Each message unique with custom attributes populated also from MySQL.
I used to generate the messages and then insert them into queuing system but for 100k messages I never managed to make this fast... I have managed to queue all these messages in less than half a second using just one MySQL query.
If anyone has any better ideas, please let me know!
MongoDB offers findandmodify which makes for a good synchronized queue up to some point. If anyone's using PHP and Mongo, feel free to take a look at MongoQueue: https://github.com/lunaru/MongoQueue
Once you start hitting hundreds of jobs per second, you'll want to scale horizontally, but that shouldn't be the case for 99% of use cases.
Hey there. The pattern in TFA is somewhat different: in the SWR pattern not every worker talks to the DB. Instead, only the selector does. It then hands out the work to the workers via a fast local queue. The ratio I set up for Ping Brigade is 1:1000 selector to workers. Thus a handful of selectors can feed a few thousand workers.
This may be working great in your application but you're implying that it scales nicely and you're wrong about that - hand waving may work in your case, but Baron's whole point was that there are easy solutions that will make things better if and when an application grows to the point at which it's an issue.
I've personally been down this road many times, and the last time I made the mistake of relying on SELECT FOR UPDATE in a queueing system it broke down somewhere on the road between 1msgs/sec and 50msgs/sec. That application committed before it dispatched to the worker app so I would consider it a fairly similar access pattern as yours.
The solution I went with in that case was exactly what Baron describes at "Locking is actually quite easy to avoid." - something along the lines of UPDATE queue SET selected_by = dispatcher_id, selected_time = NOW().. and then SELECT * FROM queue WHERE selected_by = dispatcher_id. I hate putting pseudo-SQL because it's already setting bad ideas in some random reader's head. Anyways, that scaled up to several thousand messages per second and ran happily for years, long after I left that particular company. May still be running depending on who you ask.
Long story short, it's great that your solution is working for you but the weight of public knowledge suggests it's not a great solution for anyone else to pick up on. Ping Brigade looks nifty, I hope it works great for you. Please don't suggest this pattern to other people.
Personally the system I work on day-to-day these days runs a Redis set-based queue similar to Resque to send a few thousand emails per second and I'm ok with it. Not thrilled, but happy enough that I don't read the Resque introduction text and blanch in horror as I did reading your article, especially as a reply to Baron's which is based on... lots and lots of real world experience with many different applications.
Fair points. This solution works for me for now and I certainly know it is not limitless. The solution with setting selected_by is something I thought about and may implement at a later point. I also am a big fan of using GET_LOCK(), for locking rather than relying on MySQL/InnoDB's built-in locking since you have finer grained control over timeouts, etc.
I understand your concern about sharing this "dangerous" knowledge, but I disagree that the solution is to hide it in a deep dark place. Would you find it acceptable if I updated my post with a discussion on scalability and a link to TFA? That way a reader will get more information about building such systems, not less.
The point of the UPDATE .. SELECT if updated pattern is that it's a mark-and-sweep completely outside of a transaction. Avoid locking > lock as little as possible > lock as quickly as possible.
Your blog is your business, the Lorax speaks only for HN.. (hey is Internet Lorax a job?) It would of course be great to update your readers, it'd be cooler to update your application and tell everyone how it worked out! Then you've got a story you can actually submit again
i always wished that mysql had a skip locked rows feature, so if you do a select for update it would skip any rows that are already locked. this way if you created a queueing system you could run select for update, but then skip rows that are already being processed (the locked rows).
i actually implemented this once partially on innodb, and it worked pretty well, no waiting for locks, but abandoned my efforts due to another project.
The only caveats are that of performance (with a traditional server I wouldn't worry about performance until you need to process hundreds of items per second, but on EC2 nodes that threshold is more near the range of dozens per second), and the need to regularly archive the "done" directory (cron solves this nicely).