The article is a bit unclear because it's lacking the proper vocabulary. Priorities and deadlines (what the article calls "SLOs") are both valid ways to approach scheduling problems with different tradeoffs.
The fixed priority systems the article talks about trade off optimal "capacity" utilization for understandable failure dynamics in the overcapacity case. When you're over capacity, the messages that don't go through are the messages with the lowest priority. It's a nice property and simple to implement.
What the article proposes is better known as deadline scheduling. That's also fine and widely used, but it has more complicated failure dynamics in the overcapacity case. If your problem domain doesn't have an inherent "priority" linked to the deadlines, that may be acceptable, but in other cases it may not be.
Neither is inherently better and there's other approaches with yet different tradeoffs.
I've seen enough versions of these systems and reached the conclusion that the best you can do is shift lower priority work off-peak, but you still need to be overprovisioned, or else a p99 even will stall the system with too much work. People will talk about priorities, but on the scale of one day, almost everything is high priority.
IN multitenant situations this is even more complicated. Because I may have a time slice of the machine I’m “entitled” to and within that I have priorities with which I want to allocate things. If one of my tasks can’t complete, I hope it’s to the benefit of another of my priorities. Not someone else’s.
I suspect you intimately need a little language for this. Decoding priorities for frames in a video stream are already complex and those are trivial compared to the sorts of scenarios Conway’s Las introduces.
Yeah, I don't see how deadlines/SLOs does any better when there's insufficient capacity to meet those of queued jobs at their concurrency.
> I don’t think it’s possible to meet a latency target through prioritization when there is a fundamental lack of capacity.
This seemingly implies that it would be achievable with target deadlines/SLOs. At capacity I don't see a better solution than give each priority (or target latency) a defined minimum resource allocation.
Under deadline scheduling, every pending job _eventually_ has highest priority as time elapses. (Assuming new jobs can’t arrive with deadlines in the past.) Every job is eventually serviced.
The “pain” experienced in an overload situations is spread among all late jobs. Contrast this with fixed-priority scheduling, where lowest priority jobs will be starved completely, until the overload is resolved.
That's where I was saying in the case of insufficient capacity, the available capacity is divided among the priorities in non-equal but non-zero portions. The SLO/deadline method effectively would be doing something similar as everything would be overdue and the most overdue get highest priority to run. The only difference is that the there's no unequal portioning unless there's additional logic to say x overdue of job A is more important than x overdue of job B, which amounts to setting priorities in the end.
> The “pain” experienced in an overload situations is spread among all late jobs. Contrast this with fixed-priority scheduling, where lowest priority jobs will be starved completely, until the overload is resolved.
Though this is often not a good way to spread out the pain.
It's probably much worse for the 15s job to be a minute late than for the 8h job to be a minute late, but basic deadline scheduling will treat them the same. So you want sharing, but uneven sharing.
Thanks for sharing some of your insight onto the problem domain. Do you happen to have any reference for laymen like me to onboard onto the subject? It's an awfully interesting topic but when I stumbled upon it I tend to fall back to JIT learning and troubleshooting, which is far from the best position to be.
This is a class of problems known as scheduling. The Wikipedia page is a good place to start [0]. It focuses on task schedulers rather than message queues, but the same principles apply. For a deeper theoretical basis, Wiley has a good book [1]. Most undergrad curricula will have a module on this as well, so you can find info in most comp sci textbooks.
The fixed priority systems the article talks about trade off optimal "capacity" utilization for understandable failure dynamics in the overcapacity case. When you're over capacity, the messages that don't go through are the messages with the lowest priority. It's a nice property and simple to implement.
What the article proposes is better known as deadline scheduling. That's also fine and widely used, but it has more complicated failure dynamics in the overcapacity case. If your problem domain doesn't have an inherent "priority" linked to the deadlines, that may be acceptable, but in other cases it may not be.
Neither is inherently better and there's other approaches with yet different tradeoffs.