Here's something I do regularly: I have B bits of data, where B is multiple orde...

Here's something I do regularly: I have B bits of data, where B is multiple orders of magnitude larger than the RAM I have available. I can process chunks of B in parallel with a near linear improvement in throughput, but still at an overall lower rate than I can read it from storage.

In other words, I/O is faster than CPU for this task.

A naive design where I read the data as fast as possible on a dedicated thread, and dump it into an unbounded queue that a thread-per-core consumes, will quickly run out of memory.

By putting a limit on queue depth, the queue can communicate back to the storage reader that it can't accept more data. This is backpressure. The reader in turn can decide whether to e.g. wait, slow down, discard etc. as appropriate for the use case.