A lightweight, high-performance, language-independent job queue system

throwaway2016a · on Dec 22, 2017

Clearly a lot of thought and work has been put into this and I give the authors credit for that.

I appreciate any entry into the space but this is a really crowded space. With some very battle tested entries in the field as people have pointed out.

There are a few questions not in the readme that I think need answering for a queue system:

- Execution guarantees (at least once, at most once, exactly once?)

- Order guarantees (FIFO, approximately FIFO, nondeturministic, etc)

- Throughput compared to other systems

- Fault tolerance characteristics, how many nodes can I lose before it stops working, when it does how do I recover

As I said, I have a lot of choices in this field. I'd like to see all the data up front.

mighty_warrior · on Dec 22, 2017

Agreed, there are a lot of more robust products already in this space.

My biggest concern is its reliance on mysql. There is no way this could be a valid option for high volume messaging when it is essentially database as a queue.

jstanley · on Dec 22, 2017

Surprised nobody has mentioned beanstalkd yet: https://kr.github.io/beanstalkd/

> Beanstalk is a simple, fast work queue.

> Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.

The protocol is easy to drive directly and there are good libraries for most common languages.

orthecreedence · on Dec 22, 2017

Beanstalkd is always the queue engine I reach for. Sometimes I'll build something over SQL if it's low-volume, but I always design my queue APIs to match beanstalkd's so it can be swapped in.

Beyond being a great piece of software, I find the protocol to be really well-designed for a work queue.

alexchamberlain · on Dec 22, 2017

It would be interesting to hear the rationale behind using this over something like RabbitMQ, which has its own storage layer, as well as the queue aspect.

thruflo · on Dec 22, 2017

I made one of these once: https://github.com/thruflo/ntorque

As other comments have pointed out, HTTP is not a great idea because timeouts, etc.

The one subtle benefit that I can relay is that by using your main database as the storage layer, you can enqueue tasks within a transaction, as per: https://github.com/thruflo/ntorque/blob/master/src/ntorque/c...

zbentley · on Dec 22, 2017

Database-as-queue tends to run into performance ceilings before dedicated "true queue" systems, in my experience. As you point out, though, sticking with an RDBMS gives you nice transactionality. Additionally, using a database for a queue for as long as you can before switching also offers you the benefits of using a time-tested, usually widely-supported protocol with well documented reliability guarantees. Your queues are also easily introspectable, which can be nice (this is part and parcel with why databases tend to hit perf ceilings as queues though).

RabbitMQ also supports transactions for some queue operations, but its notion of "transaction" and what you can do inside of one is much more limited than a typical database's: https://www.rabbitmq.com/semantics.html

fleetfox · on Dec 22, 2017

Can someone explain what happens here: https://github.com/fireworq/fireworq/blob/ee0d07eb18e1dba954...

Is that because mysql doesn't have FOR UPDATE SKIP LOCKED?

spraak · on Dec 22, 2017

Honest question, how is using MySQL lightweight? Many job queues I've seen use e.g. Redis

bboreham · on Dec 22, 2017

If the requirement is for persistence, i.e. no data loss if the queue process dies, then Redis won't fit.

EDIT: TIL Redis has the option to turn on fsync-to-disk on every write. Probably not what people are thinking of when they suggest Redis as lightweight.

mmjaa · on Dec 22, 2017

Redis has persistence if you need it. But only if.

markbnj · on Dec 22, 2017

I would still call redis lightweight compared to mysql, but yes fsync always has a significant performance cost.

latchkey · on Dec 22, 2017

Similar to the AppEngine taskqueue. Nice job! https://cloud.google.com/appengine/docs/standard/java/taskqu...

thesandlord · on Dec 22, 2017

FYI: Tasks are finally being decoupled from App Engine into their own standalone service (similar to Datastore before it).

Currently Alpha: https://cloud.google.com/sdk/gcloud/reference/alpha/tasks/

(I work for GCP)

shaunparker · on Dec 22, 2017

A bit off topic, but do you know if there has been any discussion to bring the search API to a standalone service?

https://cloud.google.com/appengine/docs/standard/java/search...

thesandlord · on Dec 22, 2017

Not sure, and I can't talk about things that aren't public anyway :)

GCP and Elastic did partner to offer hosted Elasticsearch though it's not a fully managed service: https://www.elastic.co/about/partners/google-cloud-platform

zbentley · on Dec 22, 2017

Having the number of parallel consumers configured per-queue (as opposed to consumers dynamically being able to join and leave) seems like it imposes many of the same restrictions that make Kafka less than ideal at being a job queue.

Basically, if you have messages which can take different amounts of time to process, or you need to quickly dynamically scale the number of consumers on a queue in response to volume.

How well does the "update max_workers" queue-modification command work in situations of very high message volume and/or high consumer counts?

glwtta · on Dec 22, 2017

Hm, I've been looking for a lightweight, language-neutral job queue. RDBMS-backed is my preference since I don't need massive scalability, but do want persistence that's easy to reason about (so not Redis), transparency (so not beanstalkd) and a long-lived history (so not most of the other ones).

But, RDMBS only makes sense if you can use your existing installation, so MySQL-only is a nonstarter.

mtve · on Dec 22, 2017

orphan_jobs.sql: SELECT job_id FROM `{{.JobQueue}}` WHERE status = 'grabbed' AND grabber_id != CONNECTION_ID() LIMIT 1000

can it break when MySQL's thread ID is reused, and some new client will get the same CONNECTION_ID as a previous failed worker?

ngrilly · on Dec 22, 2017

Interesting, but I'd like to know why there is a polling_interval? Considering there is a master server for each dispatch queue, one could expect a design where polling is unnecessary.

cle · on Dec 22, 2017

When would a push queue like this be preferable over a pull queue?

zbentley · on Dec 22, 2017

I may not be reading the documentation correctly, but I don't think this is a push queue; it uses the term "push", I think, to describe delivering messages to a queue and not as it is traditionally used to describe handing messages to consumers.

In answer to your question generally: push-based models, while more complex, tend to be higher performance (by dint of improving throughput: the broker can push messages to a consumer while it's working on other things rather than waiting for a "gimmie" request; the broker can also coordinate when and how it delivers messages to which consumers for maximum performance, which can lead to significant speedups in high throughput situations).

A very powerful pattern that gives a reasonable amount of control in a push-based situation is combining a push based model with client acknowledgements, and a client "window" of a number of messages that the client may or may not have noticed yet, especially if messages can be taken back from that window programmatically in the event of slow or dead consumers. This is what RabbitMQ/AMQP calls "Qos".

In my experience, push-based messaging models should typically not be adopted up front, unless throughput requirements are known to require such a model. The added complexity (mental and in code) of managing a push-based queue model is very rarely worth investing in at the beginning of development.

Furthermore, it's possible to have extremely high performance in a simpler pull-based model, provided you make some tradeoffs. This is what Kafka does.

I would recommend switching from pull to push only when it becomes necessary (though this can be a non-trivial amount of effort depending on how tightly integrated your code is with your messaging system). RabbitMQ/ActiveMQ/Redis/Resque as brokers will shine here, since they all support both pull models ("get" in AMQP) and push models.

cle · on Dec 22, 2017

According to its documentation, this does deliver messages to consumers.

I'm just struggling to imagine a use case where a push queue would be preferable over a pull queue. I'm sure they exist, I've just never encountered one before. Seems like the major difference is centralized throughput control, which would allow you to minimize variance in message processing latency. There are similar use cases in e.g. operating systems for minimizing latency variance for better UI responsiveness, but I can't think of any concrete use cases for using this in high scale backend queues.

thomaslutz · on Dec 22, 2017

How does it compare to e.g. ActiveMQ? http://activemq.apache.org/

virgilp · on Dec 22, 2017

The title fits perfectly the description of ZeroMQ[1]. Except that I actually trust ZMQ to deliver on the promise.

[update] Oh, it's "job queue system", not a generic "queue system". So it's not quite a perfect fit, I guess :)

[1]http://zeromq.org/

icebraining · on Dec 22, 2017

Yeah, a better example would be https://github.com/zeromq/malamute

virgilp · on Dec 22, 2017

Well, my original message was a bit snarky, but what I meant was that, at the very least, this project advertises the wrong features. I mean come on, "lightweight" and "high-performance"? Maybe advertise safety properties, resilience, I don't know... but the way it is built, it can't possibly distinguish itself on the "high performance" front. As for "lightweight".... you can use ZMQ for intra-process communication. That's lightweight - not using an external RDBMS.

I'm not saying this project is bad - I have no way of knowing. It might have legitimate usecases where it excels. But what it advertises can't be true.

jerf · on Dec 22, 2017

"Lightweight" is really becoming a turnoff for me when people use it in their description. It (mostly) means "I haven't worked on this long enough to add the features all the other projects in this space immediately found were too useful to go without" or even "I haven't worked on this long enough to produce much code".

"Lightweight" is really only interesting to me in two cases: First, you're designing it for limited resources, like an embedded system, for which the standard answer is simple too large to even consider. Second, when the standard answer in the field is so "heavy" (an ill-defined term itself, but moving on) that it causes problems of its own. JVM solutions sometimes get to this point, where the act of administering the solution itself gets bogged down in merely administering the JVM.

I do not personally have the problem that my job queues are too heavy, nor have I heard anyone else complain that ZMQ or Redis are just so heavy for what they do.

nhumrich · on Dec 22, 2017

This is a cool idea, but having the worker use http is just asking for problems. What happens if the http connection times out? For a worker architecture, it can be expected for some jobs to take north of 5 minutes. Http will time out by then, but the worker will still keep on going.

dabernathy89 · on Dec 22, 2017

aren't there long-lived http connections? e.g., https://developer.twitter.com/en/docs/tweets/filter-realtime...

heavenlyblue · on Dec 22, 2017

You’re mentioning HTTP as an issue when the persistence layer behind the queue is based on MySQL, which does not seem to be in any way relevant here.

elvinyung · on Dec 22, 2017

> It is built on top of RDBMS (MySQL),

Can I hear more about the rationale behind this?

bboreham · on Dec 22, 2017

I'm not involved in the project, but you do this if you want the same robustness as an rdbms, e.g. if a client gets an acknowledgement back from the queue then the data is definitely captured and won't be lost if the queue process then crashes.

elvinyung · on Dec 22, 2017

I agree that it makes sense in certain circumstances and definitely depends on requirements (not only transactional enqueues which you mentioned, but also a lot of times it's a need for O(lg n) lookup/update/deletion of jobs in flight), and I've also seen a few MySQL-backed queues in production that can handle fairly high throughputs. I'm particularly and mostly interested why this project does it.

spookyuser · on Dec 22, 2017

Could this work on Heroku?

frik · on Dec 22, 2017

How is the performance, how good does it scale horizontal/vertical?

Could it be used as a replacement to Kafka message queue/ring?

(me is still looking for a Kafka-like piece coded in Go)

gnur · on Dec 22, 2017

Have you taken a look at nats? (https://nats.io/)

Otherwise, the cloud native landscape has some similar projects listed in the same category as kafka: https://github.com/cncf/landscape

retzkek · on Dec 22, 2017

Have you seen jocko? https://github.com/travisjeffery/jocko