There is a subtler issue. Even if your average input and output rates are OK, the lumpiness (stochastic variations) in input and processing rates can cause queues to build up.
In the simplest example, with "well behaved" arrival and processing rates, and a single server (an M/M/1 queue), the average queue length is 1/(1-mu) where the utilization mu = avg arrival rate / avg processing rate. So as the arrival rate approaches the processing rate the avg queue length becomes infinite. In reality, you want to keep the average utilization below 80% to keep queue lengths reasonable.
Alongside a link to Google's explanation of why they do it, that's a very reasonable and helpful reply. "Changing the queuing method from the standard first-in, first-out (FIFO) to last-in, first-out (LIFO) [...] can reduce load by removing requests that are unlikely to be worth processing"
> Most services process queues in FIFO (first-in first-out) order. During periods of high queuing, however, the first-in request has often been sitting around for so long that the user may have aborted the action that generated the request. Processing the
first-in request first expends resources on a request that is less likely to benefit a user than a request that has just arrived. Our services process requests using adaptive LIFO. During normal operating conditions, requests are processed in FIFO order, but when a queue is starting to form, the server switches to LIFO mode. Adaptive LIFO and CoDel play nicely together, as shown in figure 2. CoDel sets short timeouts, preventing long queues from building up, and adaptive LIFO places new requests at the front of the queue, maximizing the chance that they will meet the deadline set by CoDel. HHVM3, Facebook’s PHP runtime, includes an implementation of the Adaptive LIFO algorithm.
For instance, a bunch of clients all make a request to a server at the same time, briefly saturating the server. If all the clients have the same timeout without jitter, they will all try again together at the same time once the timeout expires, saturating the server again and again. Jitter helps by « spreading » those clients in time, thus « diluting » the server load. The server can then process these requests without saturating.
The basic idea behind that is also used in all sorts of networks where you have multiple stations sharing the same medium with everyone being able to freely send stuff. To solve this, if a "collision" is detected, stations then use a random timeout before they send again in the hope that the next time there won't be another collision.
If you use a linked list it’s like always adding at the head of the list and always removing from the head.
Let’s say you have two clients for your server one sends a reliable 1 request per second, and the other sends 10,000 requests every hour in a burst.
The lifo will basically result in the 1/sec client always getting their requests answered and when you get a burst from the other client most of their requests will get dropped. Assume your server can handle 100 reqs/second with a 1 second time out.
In the simplest example, with "well behaved" arrival and processing rates, and a single server (an M/M/1 queue), the average queue length is 1/(1-mu) where the utilization mu = avg arrival rate / avg processing rate. So as the arrival rate approaches the processing rate the avg queue length becomes infinite. In reality, you want to keep the average utilization below 80% to keep queue lengths reasonable.