I agree, but there are a lot of footguns with RMQ. A great example of one is tha...

zbentley · 2024-09-18T20:19:42 1726690782

That was true, but the more recent Quorum queues provide more traditional scalability tradeoffs (you can set a ceiling on the number of synchronous replication hops that a published, replicated message goes through): https://www.rabbitmq.com/docs/quorum-queues

nurettin · 2024-09-18T21:08:42 1726693722

The most common elephant foot gun in the room is buggy processes letting queues grow.

RMQ immediately slows down (due to mnesia causing delays) and processes start dropping messages despite having system resources to grow.

pas · 2024-09-19T08:01:39 1726732899

can you elaborate on the details? I have some memories about running OpenStack where Rabbit "was slow", but we never figured out why. mnesia is the storage layer?

nurettin · 2024-09-19T08:32:15 1726734735

Yes, it was using mnesia as the storage layer, and if I had a few dozen queues with a few hundred messages each, it caused timeouts in some clients (celery/kombu is an example).

I decided to add expiry policies to each queue so that the system cleans itself from stale messages and that fixed all the message dropping issues.

4.0 Changelogs state that they are switching to a new k/v storage (switching from experimental to default)

pas · 2024-09-19T09:02:13 1726736533

Thanks for the details!

Yep, similar symptoms. (OpenStack's services are also written in Python, or at least were back then, so probably similar to Celery.) We had regular problems with RMQ restarting. (Unfortunately I can't recall if it was for OOM or just some BEAM timeout.)

A few hundred messages in a few dozen queues seem ... inconsequential. I mean whatever on-disk / in-memory data structure mnesia has should be able to handle ~100K stale messages ... but, well, of course there's a reason they switched to a new storage component :)

rhodin · 2024-09-19T17:15:04 1726766104

Mnesia is _not_ the storage layer for messages (except for delayed messages).

Mnesia stores vhosts, users, permissions, queue definitions and more. This is being transitioned to Khepri, which improves a lot of things (maybe most importantly netsplits) but not directly message speeds.