After reading Enterprise Integration Patterns [1] a few years ago, I've been under the impression that message queues are the holy grail for decoupling systems that you want to pull apart, but this offers a really nice perspective on where they might not be a panacea. Thank you!
There's a couple of nice ideas in here (the "message pump" for one at least) that I'm going to steal and use at work. It's also comforting to know that we're not completely crazy for using a DB to store tasks/processes that need to be run and retried if necessary instead of a message queue.
I know what you meant by saying you use DB _instead_ of message queue but I think it kind of points to a broader point - message queue is a concept and the technology used to implement it is just implementation detail. It just turns out that databases have all kinds of useful and powerful operations that make implementing message queues relatively easy.
If you don't need high throughput and otherwise don't have a reason to add a separate technology to the stack just to have a queue, SQL can make a great, simple and reliable queue!
* Postgresql and Oracle with skip locked
* MS SQL Server with readpast
* DB2 with skip locked data
If you use MySQL, it's going to be a bit more difficult.
Using the database as a queue is an anti-pattern. There used to be an entire blog series about why this is the case. However, I can't find it at the moment.
With MySQL, there's a fairly simple trick you can do to build a queue. Set a temp var to null, run an update with a select subquery to update state of next row(s), then select that var to get your queue row pk's.
Having been on the receiving end of more than a few apps where the devs used a database as a queue, I say "please don't".
Yes, Oracle's AQ product is built on top of an RDBMS. And yes, early versions of Microsoft's MSMQ used SQL Server. But most dev teams use the database as a shortcut and don't invest the effort in making their hacktastic DB queue work like a real product.
I have used DB as queues very successfully in multiple projects (as well as "real" queues in others), so your "please don't" is not very persuasive. Some actual argument, instead of content-free "hacktastic" vs. "real product" designations would be appreciated.
And you use "shortcut" like some kind of negative attribute. Of course they use DB as a shortcut, and they should - it works well and even has additional features (transactions with your primary store) that you wouldn't get with separate queue product.
Monitoring and administration are more difficult until you write them yourself, especially if you are splitting monolithic code. Many of the message queue products you download include these. Can your DB message queue also support different queue concepts and message routing?
In my case it supported multiple queues (also, a concept of hierarchical queues), but no message routing. As for administration and monitoring - you are correct, you don't get it out of the box. Well, you do get some basic administration for free from whoever administers your database :)
On the other hand, you can grow solution for those organically over time to fit your needs.
In a previous life I was on a team that moved a ecomm backend-Email Service Provider(ESP) from a postgres based queue to rabbitMQ. The motivation was voiced in language similar to yours, "hacktasic" was actually used more than once as a reason to replace the DB with rabbitMQ.
In the end, I'd say they were equally reliable, but from a data analyst's perspective I very much missed the postgres based queue. I think the mechanics inherent to a database solution prompted the original implementers to keep messages around over time vs the mechanics inherent to an MQ solution where they became ephemeral (on the senders side.) Having access to those messages was a treasure trove for troubleshooting and for analytics. That and from a pure cost benefit stance, sending 10-20k messages a day to the ESP definitely didn't necessitate the expensive rearchitecture, as the DB solution was more than capable of handling the load on cheap hardware.
The reason to use message queues rather than RDBMS for queue behaviors isn't because queues do stuff databases don't, but rather because of connection cost. There's a lot of complicated handling noise around the database connection, whereas queues tend to be very inexpensive.
As an alternate anecdote, I have had far more issues with MQ connections than with RDBMS connections. I don't think there is a clear advantage to messaging from that standpoint.
Do you mean development/complexity cost or performance cost? If it is the former, then I don't really see it. If it is the latter then yes - RDBMS backed queue is probably going to hit scaling limits earlier then dedicated product.
you mean now you CAN monitor the queue... it's been very helpful for us to just throw alerts on queue size in AWS Simple Queue Service.. but we have some apps keeping state in Mongo and you have to write a custom script each time to query and send data to Cloudwatch
Using message queues to decouple the components suffer the same problem as micro services. You start to get a lot of implicit dependencies that you must document.
Beyond that, you tend to often realize you haven't quite made the logical split of components in quite the right place, or the "right place" for that split to be changes over time. Then you've got the fun task of moving functionality from one component to another, which is extremely expensive in development time.
(or, more often than not, because of the cost of doing this, you end up putting up with a slightly batshit-insane design...)
Or in my experience, Service A sends a message to Service B, which sends a message to Service C, which sends a message to Services D, E, and F, and Service F sends another message to Service C, which this time it sends a message to Service G (so at least it's not completely circular), which then hits the database and returns information back up the chain.
I'm exaggerating a little bit, but not too much (actually, on further reflection, I might actually be downplaying it a bit. some of our services schedule tasks with like 10 different services for every item processed, and we do tens of thousands a day).
Debugging issues in this mess is not fun, because there's so many places you need to check to see if it's the source of the failure or not, and a failure in one service could really be in a different service so you have to test all the way up and down the chain. For every bug.
But I was under the impression that the point of message brokering is that it doesn't have to be service A that puts it there; only that some service puts the message there.
I feel like the article is attacking message brokering by discussing the disadvantages of bad use cases for them. A good use case for message brokers is when work needs to be done on an item, but not immediately.
My company uses them in a way I believe is quite effective. We pull in data from an external source, and send the ID of the item to about five different queues to do different tasks. Each time one of the queues finishes the work, it send a message to a validator that checks to see if all the work is done. If it isn't, it waits to get that message again from another worker. If it is, it marks that item as ready for the end user.
These are the kinds of use cases I think message brokers should be used for. Not to send a message and wait to get an answer back. Why not just use an HTTP request for that?
> But I was under the impression that the point of message brokering is that it doesn't have to be service A that puts it there; only that some service puts the message there.
It doesn't matter, but it's more about debugging. If Service A does not work correctly, where is the bug? Is i Service A? Service B? The network?
The use at your company, is idiomatic to the paradigm. You have n different units of work, that can run seperately, so you do that and communicate with messages.
If you are using messages correctly, it shouldn't be difficult to debug. You have an input and output for each service and you see where something happens differently than expected. I'm not sure where you are saying the difficulty comes from.
That matches my experience — I've seen a fairly common learning process where someone adopts a message queue, decides it's great and uses it all over the place, and then spends awhile working through the various failures which didn't happen in their development/testing environment so they're making decisions about how to handle dropped messages, duplicates, etc. in a rush.
It's not that hard to do but it seems to take people by surprise and it's not helped by some poor defaults like almost everything in the RabbitMQ ecosystem silently blocking when the queue fills up (this probably happened transiently multiple times before it hit the level where it caused a visible outage but how many people will notice that if the default isn't to log or raise an error?).
I've been working in a related space for a few years and wanted to offer a few counterpoints to the article.
First, if you're doing request/response using messaging, you're probably doing it wrong. Pub/sub and request/response are totally different animals. I for one, consider it both reasonable and necessary to use both side-by-side, in the same infrastructure. (Is this view uncommon?)
In our technology stack, which is a monolith-becoming-microservices, we use both pub/sub and request/response side-by-side. The general rule is that if service A calls service B, if the nature of that interaction is such that service B's response can preempt/interrupt/influence service A, the call needs to be done inline, in service A. If the nature of the interaction is more "advisory", use pub/sub.
Examples (from the hotel booking space):
(a) When a reservation gets canceled, we publish a cancellation event. The reservation is then CANCELED, officially. A separate service sees the cancellation event and frees the associated held room inventory; that's a pub/sub interaction.
(b) When a reservation wants to check in to a room, we check whether the room is already occupied. This has to be done using request/response (in our case, gRPC) because if the room is occupied, that's a hard gate on the success of the checkin.
Second, pub/sub != work queues.
Pub/sub is about distributing small bits of information all over the system and letting things be advised of stuff. Using a messaging system as a work queue is overall pretty stupid. I know it's common to use a message broker like RabbitMQ for task distribution, but it's silly. It's really silly when the tasks themselves contain huge binary objects inline, as part of the message payload. Store that shit in S3 or a proper system, and keep the message payloads light.
A pub/sub relationship should convey metadata of state change. Any state change in the system should be communicated via pub/sub.
A subscriber might need to use a request/response interaction with some other microservice in order to act on a pub/sub, but that's not a state change, that's just auxiliary data needed to do its job that is triggered on a state change.
> Pub/sub and request/response are totally different animals. [...] (Is this view uncommon?)
One model is synchronous, the other is asynchronous. I don't see why would
anybody have any doubts.
> Second, pub/sub != work queues.
Publish/subscribe model doesn't say anything about preserving and
acknowledging the messages, and with work queues usually only one worker takes
the queued job. I don't see why would anybody mistake one for the other.
This article is not just very well written, it's also very funny. Some gold:
> a message broker is a service that transforms network errors and machine failures into filled disks
...
> mark it as required in the database, and wait for something else to handle it.
>
> Assuming that something else isn’t a human who has been paged.
...
> Systems grow by pushing responsibilities to the edges
...
> A distributed system is something you can draw on a whiteboard pretty quickly, but it’ll take hours to explain how all the pieces interact.
I love when you can mix technicals and not taking yourself seriously.
Although be careful with using DB as a task queue. Concurrency is a b* and message brokers are very good at it. AMPQ has been created because the authors started with a BD message broker and it didn't work.
A task queue is message broker + persistance + status. Celery does that very well in the Python word, and works with rabbitmq, redis, postgres, etc.
What amaze me the most is autobahn + crossbar.io. It does PUB/SUB, RPC, load balancing and all the stuff for Python, JS, PHP, C#, Java... And it works even in the browser. Cool stuff.
I have to say. I spent way too much time trying to understand what b* trees have to do with concurrency, and what system you are using that implements them.
It seems to me that there are great benefits to be had if you can keep the transactional integrity of direct connections to a relational database like PostgreSQL. The performance and scalability is often better than people expect because the data stays closer to CPUs and it avoids moving too much across multiple nodes on the relatively slow and unreliable network.
In a lot of cases, there are natural separation lines such as between groups of customers you can use to shard and scale things up. Unless you are building something like a social network where everything is connected to everything, you don't need a database that runs over a large cluster or clustered queues in between components. These are often just more moving parts that can break.
The benefits of transactional queues in your database are hard to overstate; commit the result of the task in the same transaction as you commit the queue update. Don't worry about idempotency, lost messages, or duplicate messages.
I suspect the advice to avoid it because of performance has become invalid for all but extreme use-cases. My company has dozens of high activity 1-100M item queues in single postgresql databases. It works great.
If you segment your databases into micro services, it think you loose transactions. Say two services talk via DB queue. You'll need to store the DB for each service in the same management server, but different schemas. If you ever want to move your DB for one service, you have to introduce distributed transactions. That it drop them.
Queue is not the holy grail, especially if you want to put something on the queue during the db transaction. Be aware, that once you do I/O (some RPC or queue interaction) you loose the ACID in the db, you will make the system much less reliable this way.
The way to do this, it to publish on queue _after_ transaction, but also be sure that the action won't be lost in the process.
The term SOA is always mentioned with the underlying connotation that a client requests a resource and somehow there is an extra step of "service discovery" which has it's own set of problems that branch out into various fields of physics and mathematics.
When I see such branching complexity, I often think that the architecture is somehow wrongly backwards and try a simple reversal of responsibilities. In this case, the services would be looking for a job to complete, in effect turning the afromentioned step into "job discovery" which is indeed the kind of architecture I've been applying the past decade.
Seems to be working well so far. Back pressure is handled at the front gate as none of the services pick up on the job, involuntary synchronization is still a problem, but avoidable by cleverly re-ordering the job queue. The job completion is communicated back to the front gate through pub/sub and the anecdotal evidence so far has been great.
Dealing with fairly low-volume stuff, my concern about using the database alone has to do with who owns what schema and how changes are managed.
Being able to send off a rich asynchronous message is nice because it means you do not need to have some co-owned table in a shared database that two different components are reading and writing from.
Or, worse, a widespread pattern of every service exposing a piece of its database to other services with an unsatisfying level of logging or control for what really happens.
Hmm, I submitted this yesterday, but it got lost in the scrum.
It's always interesting to hear real reports from the trenches that aren't essentially ads for technology XYZ. I'm afraid that all too often, we make things more complicated for bad reasons; whether ignorance, chasing the latest trend, or resume-driven-development...
Message-oriented stuff has been around a long time, so I don't think it's a fad.
It's basically taking the concept of "integration" (or API) itself and creating a product for it, just as a database is a product for the concept of persistence. Thus just as not every application has to reinvent a database, with a messaging product not every product has to reinvent queueing up integration calls if the target system isn't available.
The article also mentions request-reply being "what you really want", I think that this is a) not true, as a lot of the time you can fire and forget, and b) when you need it the products generally provide a request-reply API on top of their lower-level APIs. No need to reinvent.
I got the impression that the author didn't really want a message based system at all, but rather a request response system that they tried to impliment within a message broker. Of course that's going to create more headaches than it solves; it's the wrong solution.
Both message brokers and request-response code have their places in distributed systems, but they really need to learn when each is appropriate.
I agree, saying that request-reply is "what you really want" was kind of silly, especially after the opening paragraph that states "it depends".
In my eyes (where the GP makes perfect sense), a message broker is asynchronous, there's no implicit wait while your consumers work on our request. A request-response interface will stop the producer until the consumer is done.
There's a couple of nice ideas in here (the "message pump" for one at least) that I'm going to steal and use at work. It's also comforting to know that we're not completely crazy for using a DB to store tasks/processes that need to be run and retried if necessary instead of a message queue.
[1] https://en.wikipedia.org/wiki/Enterprise_Integration_Pattern...