Wrong. If you do any of these you're scaling vertically, even by that definition...

suremarc · on July 14, 2023

Think of it this way, instead. Building a multi-belt system is a pain in the ass that complicates the design of your factory. Conveyor belt highways, multiplexers, tunnels, and a bunch of stuff related to the physical routing of your belts suddenly becomes relevant. But you can still increase throughput keeping a single belt, if your bottleneck is not belt speed but processing speed (in the industrial sense). I can have several factories sharing the same belt, which increases throughput but not latency.

Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. In Factorio you need 3 advanced circuits for the chemical science pack. If your science lab can produce 1 science pack every 24 seconds but your pipeline takes 16 seconds to produce one advanced circuit, your whole pipeline is going to have a latency of 48 seconds from start to finish due to being bottlenecked by the advanced circuit pipeline. Doubling the amount of processing units in each step of the circuit pipeline will double your throughput and bring your latency down to 24 seconds, as it should be. And if you have room for those extra processing units, you can do that without adding more belts.

The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

fluoridation · on July 14, 2023

>Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. [...]

This isn't latency in the same sense I was using the word. This is reciprocal throughput. Latency, as I was using the word, is the time it takes for an object to completely pass through a system; more generally, it's the delay between a cause and its effect/s. For example, you could measure how long it takes for an iron ore to be integrated into a final product at the end of the pipeline. This measure could be relevant in certain circumstances. If you needed to control throughput by restricting inputs, the latency would tell you how much lag there is between the time when you throttle the input supply and the time when the output rate starts to decrease.

>The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level. You take a unit of work that's both atomic (it's either complete or incomplete) and serial (a thread can't do anything else until it's completed it), and take a timestamp when it's begun processing and another when it's finished. The difference between the two is the latency of the system.

suremarc · on July 14, 2023

> This isn't latency in the same sense I was using the word.

But it is. If you add more factories for producing advanced chips, you can produce a chemical science pack from start to finish in 24 seconds (assuming producing an advanced circuit takes 16 seconds). Otherwise it takes 48 seconds, because you’re waiting sequentially for 3 advanced circuits to be completed. It doesn’t matter that the latency of producing an advanced circuit didn’t decrease. The relevant metric is the latency to produce a chemical science pack, which _did_ decrease, by fanning out the production of a sub-component.

Edit: actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.

> Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level.

That’s what I’m saying about Factorio, though. You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.

fluoridation · on July 14, 2023

>actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.

That's still reciprocal throughput. 1/(science packs/second). You're measuring the time delta delta between the production of two consecutive science packs, but this measurement implicitly hides all the work the rest of the factory did in parallel. If the factory is completely inactive, how soon can it produce a single science pack? That time is the latency.

>You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.

Suppose instead of producing science packs, your factory produces colored cars. A customer can come along, press a button to request a car of a given color, and the factory gives it to them after a certain time. You want to answer customer requests as quickly as possible, so you always have ready black, white, and red cars, which are 99% of the requests, and your factory continuously produces cars in a red-green-blue pattern, at a rate 1 car per hour. Unfortunately your manufacturing process is such that the color must be set very easy in the pipeline and this changes the entire rest of the production sequence. If a customer comes along and presses a button, how long do they need to wait until they can get their car? That measurement is the latency of the system.

The best case is when there's a car already ready, so the minimum latency is 0 seconds. If two customers request the same color one after the other, the second one may need to wait up to three hours for the pipeline to complete a three-color cycle. But what if a customer wants a blue car? I've only been talking about throughput. Nothing of what I've said so far tells you how deep the pipeline is. It's entirely possible that even though your factory produces a red car every three hours, producing a blue car takes you three months. If you add an exact copy of the factory you can produce two red cards every three hours, but producing a single blue car still takes three months.

Adding parallelism can only affect the optimistic paths through a system, but it has no effect on the maximum latency. The only way to reduce maximum latency is to move more quickly through the pipeline (faster processor) or to shorten the pipeline (algorithmic optimization). You can't have a baby in one month by impregnating nine women.

suremarc · on July 14, 2023

> If the factory is completely inactive, how soon can it produce a single science pack? That time is the latency.

That is what I am trying to explain, now for the second time.

Let's say you have a magic factory that turns rocks into advanced circuits, for simplicity's sake, after 16 seconds. You need 3 advanced circuits for one chemical science pack. If you only have one circuit factory, you need to wait 3 * 16 seconds to produce three circuits. If you have three circuit factories that can grab from the conveyor belt at the same time, they can start work at the same time. Then the amount of time it takes to produce 3 advanced circuits, starting with all three factories completely inactive, is 16 seconds, assuming you have 3 rocks ready for consumption.

The time it takes to produce a chemical pack, in turn, is the time it takes to produce 3 circuits, plus 24 seconds to turn the finished circuits into a science pack. It stands to reason that if you can produce 3 circuits faster in parallel than sequentially, you can also produce chemical science packs faster sequentially.

> Suppose instead of producing science packs, your factory produces colored cars. A customer can come along, press a button to request a car of a given color, and the factory gives it to them after a certain time. You want to answer customer requests as quickly as possible, so you always have ready black, white, and red cars, which are 99% of the requests, and your factory continuously produces cars in a red-green-blue pattern, at a rate 1 car per hour. Unfortunately your manufacturing process is such that the color must be set very easy in the pipeline and this changes the entire rest of the production sequence. If a customer comes along and presses a button, how long do they need to wait until they can get their car? That measurement is the latency of the system.

Again, I understand this, so let me phrase it in a way that fits in your analogy. Creating a car is a complex operation that requires many different pieces to be created. I'm not a car mechanic, so I'm just guessing, but at a minimum you have the chassis, engine, tires, and the panels.

If you can manufacture the chassis, engine, tires, and panels simultaneously, it will decrease the total latency of producing one unit (a car). I'm not talking about producing different cars in parallel. Of course that won't decrease latency to produce a single car. I'm saying you parallelize the components of the car. The time it takes to produce the car, assuming every component can be manufactured independently, is the maximum amount of time it takes across the components, plus the time it takes to assemble them once they've been completed. So if the engine takes the longest, you can produce a car in the amount of time it takes to produce a engine, plus some constant.

Before, the amount of time is chassis + engine + tires + panels + assembly. Now, the time is engine + assembly, because the chassis, tires, and panels are already done by the time the engine is ready.