Understanding Kafka with Factorio (2019)

haswell · on July 13, 2023

I recently started playing Factorio, and I kept thinking that this is what "low code" integration/automation tools should look like. Developer tooling with extremely clear visuals, obvious dataflow, endless combinations into which the rigidly defined components can be assembled to do exactly what they do.

As opposed to so many takes on "flow based" programming, which present some imperfect nodal representation of the program, but rarely can the user make sense of what's going on by seeing stuff moving around as the thing executes.

And by the way, be sure you're ready to sink some time in if you're curious about this game...it's just too good, and I've had to consciously reduce the time I'm spending, because I could just keep optimizing...building...expanding...optimizing...it's built in the shape of the reward center of my brain.

ineedasername · on July 13, 2023

>it's built in the shape of the reward center of my brain

Yes, it's like a distillation of the feeling I get from the most enjoyable parts of my job, risking productivity loss from real world responsibilities until the complexity rises high enough to require project organization and advanced planning of tasks that are the least enjoyable parts of my job, along the lines of:

"Crap, I have a sudden urgent need to deal with enemies creeping out from beyond my radar range that will push back operationalization of my proof-of-concept production pipeline. I'd estimate 3 man hours are required to perform a one-off fix on the enemies & radar expansion, maybe 5 to automate long-term... damnit I need a break, let me VPN into work to decompress from Factorio stress."

jay_kyburz · on July 13, 2023

Yes, but it also embodies the worst parts of my job, which is that there is always more to do and the work is never ending.

Aeolun · on July 14, 2023

At least in the case of Factorio you choose to do it. On the other hand, you don’t get paid for it.

reportgunner · on July 14, 2023

You should try Foxhole facility maintenance. Makes Factorio stress feel like a tickle.

cortesoft · on July 13, 2023

I wonder who the real world equivalent of biters are… sales?

ineedasername · on July 14, 2023

Too many to list. This past week’s productivity was partially hijacked by a vendor relationship gone sour to the point of getting the lawyers involved.

And so if sales are biters, where do lawyers rank?

bongobingo1 · on July 14, 2023

I think lawyers are getting added in the upcoming expansion pack.

pro542 · on July 14, 2023

lawyers are the spitters

mettamage · on July 13, 2023

Security issues in software

p1necone · on July 14, 2023

This is actually a pretty great analogy. Could be even closer if the biters came in and stole materials, or replaced things on cargo belts as well as just trying to destroy things.

oconnor663 · on July 13, 2023

rad hits

gjulianm · on July 13, 2023

> And by the way, be sure you're ready to sink some time in if you're curious about this game...it's just too good, and I've had to consciously reduce the time I'm spending, because I could just keep optimizing...building...expanding...optimizing...it's built in the shape of the reward center of my brain.

I feel the same. It scratches so many itches. I made the mistake of installing Space Exploration mod after the first run… so many hours invested now.

lsaferite · on July 13, 2023

SE is such a deep hole to fall into. I feel for you.

swyx · on July 14, 2023

any good explainers of whats fun about SE for those who will never play it?

fardo · on July 14, 2023

It adds hundreds of hours of gameplay, new endings, multiple planets, and much longer progression chains that really push your ability to optimize to the limit.

If the base game isn’t enough to keep you interested anymore because it’s gotten too simple, space exploration is a good hand grenade to mix things up.

Aeolun · on July 14, 2023

The negative thing about SE is also that it adds hundreds of hours of gameplay. I’m 200 hours in and I’ve barely made it to other planets. The 2 hours a day I have to play with it are simply not enough.

swyx · on July 14, 2023

man i wish i had some time to sink into factorio. it sounds so fun. wish i was 14 again.

keithnz · on July 14, 2023

well, at least you didn't install sea block!

throitallaway · on July 13, 2023

SE is such a slog but I must keep going!

lsaferite · on July 13, 2023

> it's built in the shape of the reward center of my brain.

This is a very accurate description of how I feel about factorio. Thanks, I'm going to use this going forward.

samsquire · on July 14, 2023

I am interested in this topic, so thank you for mentioning it.

I haven't played Factorio but this article GIFs helped me understand it, at least from a high level simplistic perspective!

I think most of programming is just logistics: moving data from one place to another, picking and choosing what fields to use for a given purpose and then calling a function with those selected parameters or talking to another system.

I am working on a number of experiments in this area. I'm working on a programming environment which is unlike programming languages where you specify instructions and the state is implied but you work directly with state and instructions are implied.

The problem with nodal editors is that they're not very information dense.

cubefox · on July 13, 2023

Semi related: Are other people also annoyed by how many projects are using names of completely unrelated famous things? I expected to read some wild association between a game and Franz Kafka, but no, it's about a streaming platform which happens to be named "Kafka". This is getting seriously annoying when you google for some, e.g. historical, term and then your search results are littered by some completely unrelated software/IT project which reuses the name for no reason in particular. "Factorio" is actually an example of how to do better: Just make up your own word!

the_af · on July 13, 2023

Even worse, Kafka and Apache Kafka have almost no meaningful connection. According to Wikipedia, the author was reading (Franz) Kafka, who was a writer, and the software system is "optimized for writing". (Franz) Kafka wasn't "optimized" for anything, so this is just whimsical naming. It could just as well have been named Apache Hemingway, or Apache Tolstoy.

Whimsical naming is ok, but can also be confusing and annoying.

titanomachy · on July 13, 2023

Wow, that's so unrelated. I think I'd tacitly assumed that it was named that because adopting an event-driven architecture results in byzantine and overly complicated software, like the bureaucracies in Kafka's novels.

bee_rider · on July 13, 2023

I’d be shocked if your explanation wasn’t the real one. “Optimized for writing” sounds like the sort of justification you give when your project with a sarcastic self-deprecating name becomes surprisingly successful.

If they wanted to name it after a fast, efficient famous writer it would be Apache Hemingway.

Kafka died before most of his writing was published and most of it was destroyed, which doesn’t seem like something the software would want to be associated with, right?

the_af · on July 13, 2023

Your comment is on point. However...

... if you read Jay Krep's introduction to logs [1] (in the Kafka sense of logs) you can see that while he has a nice sense of humor [2], he felt pretty good about the Log abstraction and about Kafka. In no sense do I get the feeling he thought he was creating a kludge or something bad -- or "kafkaesque". Judging by the article, it might as well been named Apache Tolstoy!

[1] https://engineering.linkedin.com/distributed-systems/log-wha...

[2] "'Each working data pipeline is designed like a log; each broken data pipeline is broken in its own way.' — Count Leo Tolstoy (translation by the author)"

Pamar · on July 14, 2023

Or maybe you wake up one morning and discover it turned into a giant bug...

lbarrett · on July 13, 2023

"Kafkaesque" suggests people standing in very long queues. It doesn't seem unreasonable as a name for very long queue software.

cubefox · on July 13, 2023

I think many such names are mostly chosen because they kind of sound nice and they have a prestigious association.

lionkor · on July 14, 2023

I was at one point thinking of making a Kafka (the software) competitor called Tuberculosis, but maybe that was only funny in my head.

AceJohnny2 · on July 13, 2023

On the one hand, Apache Kafka is a pretty well-known topic on this hacker/web/computing-centric message board, so for most people it doesn't need elucidating.

On the other hand, Factorio arguably has some Kafka-esque aspects!

thomastjeffery · on July 13, 2023

It would help tremendously if the title was changed to "Understanding Apache Kafka with Factorio".

ooterness · on July 14, 2023

I've read a few of Franz Kafka's short stories, but most of them went over my head. As someone who loves Factorio, I was legitimately excited to get more insight.

Now: My disappointment is immeassurable and my day is ruined.

lionkor · on July 14, 2023

I never thought how weird it would be to read Kafka in anything but German - I bet it's like reading Dostojewski in anything but Russian, but somehow worse?

CptFribble · on July 14, 2023

Isn't it because of the beneficial SEO? I'm sure we all well understand the benefits of landing a product on the first results page of a well-known historical figure.

radiator · on July 13, 2023

Yes, it should have been Apache Kafka in this case.

marginalia_nu · on July 13, 2023

> Vertical scaling — a bigger, exponentially more expensive server

This is in practice not true at all. Vertical scaling is typically a sublinear cost increase (up to a point, but that point is a ridiculous beast of a machine), since you're (typically) upgrading just the CPU and/or just the RAM or just the storage; not all of them at once.

There are instances where you can get nearly 10x the machine for 2x the cost.

dekhn · on July 13, 2023

Disagree- typically vertical scaling is lumpy, and even worse- CPU and RAM upgrades are typically not linear, because you're limited by the number of slots/sockets and the manufacturers intentionally charge higher (expoentially) prices for the largest RAM and fastest CPUs.

defendBanana · on July 13, 2023

With clouds this is not true anymore. They are exactly linear. If you ask for a smaller node they are simply propositioning a chunk of a larger machine anyway.

There is a point where the exponential pricing starts, but that point is way out there than most people expect. Probably ~100CPU, ~1TB RAM, >50Gbps network etc.

dekhn · on July 13, 2023

They're linear... because they're charging you rates based on the cost of the large server, divided down into whatever server you provisioned.

Amusingly, for $94K (probably more like $85K after negotiation) you can buy a white box server: Dual Epyc 9000, 96 core/192thread, 3.5GHz, w/ 3TB RAM, 240T of very fast SSD, and a 10G NIC. The minimum config, Dual Epyc 9124, 16core/32thread, 64GB RAM, and only 4TB of storage is $9K (more like $8K after negotiation). That's "only" a factor of 10 in price for 8X CPUs, 48X the RAM, and 60X the storage.

Dylan16807 · on July 14, 2023

> They're linear... because they're charging you rates based on the cost of the large server, divided down into whatever server you provisioned.

And the reason they do it that way is because it's cheaper. Because the scaling is sublinear up to a good size.

morelisp · on July 13, 2023

Kafka is also a system that can make pretty good general use of more CPUs and more storage, but doesn't have much need for RAM. Tying the CPU and RAM together whether by CPU model or cloud vendor offerings is annoying if you're trying to scale only vertically.

defendBanana · on July 13, 2023

Kafka can keep a decent bit of data in RAM using file system pages. Often times you end up wasting CPUs on kafka nodes, not memory i think.

https://docs.confluent.io/platform/current/kafka/deployment....

morelisp · on July 13, 2023

I find that if you are seeking lots of consumers around large topics no amount of RAM is really sufficient, and if you are mostly sticking to the tails like a regular Kafka user, even 64GB is usually way more than enough.

CPU isn't usually a problem until you start using very large compactions, and then suddenly it can be a massive bottleneck. (Actually I would love to abuse more RAM here but log.cleaner.dedupe.buffer.size has a tiny maximum value!)

Kafka Streams (specifically) is also configured to transact by default, even though most applications aren't written to be able to actually benefit from that. If you run lots of different consumer services this results in burning a lot of CPU on transactions in a "flat profile"-y way that's hard to illustrate to application developers since each consumer, individually, is relatively small - there's just thousands of them.

vegabook · on July 13, 2023

If they charge these big numbers more it's precisely because they're trying to capture some of the embarrassingly better value you get from vertical scaling. It's a testament to vertical scaling's effectiveness that they _can_ do so.

foota · on July 13, 2023

Sure, but by doing so they consume the effectiveness?

dekhn · on July 13, 2023

No, because you pay a fixed cost to get higher performance and then benefit through the whole lifetime of the product (I'm assuming you are purchasing rationally and keep your machines loaded at 75% or better, and your software is not egregiously wasteful).

morelisp · on July 13, 2023

The kind of server you'd run Kafka on tends to already be pretty far up the curve. I don't think I can get 10x our default broker for 20x the cost. Maybe 100x the cost. (I could probably get 2x it for 2x the cost but once you value HA the practical inflection point starts below the actual cost intersection.)

geodel · on July 13, 2023

The idea is don't let the logic come in the way of promoting "web scale" software.

teawrecks · on July 13, 2023

For small consumer products sure, but we're talking at the extreme end of performance and physical capabilities. Sure you can get a 2Ghz CPU for ~2x the price of a 200Mhz CPU, but how much are you going to pay for a 6.0Ghz CPU vs 5.0Ghz? 6.1Ghz vs 6.0Ghz?

Sohcahtoa82 · on July 13, 2023

Think cores instead of clock speeds.

In the case of cloud instances, doubling cores is frequently less than 100% more expensive.

morelisp · on July 13, 2023

https://aws.amazon.com/msk/pricing/ prices scale linearly with CPU beginning with m5.large, and I wouldn't really want to run a production Kafka on anything less than m5.xlarge. (They do at least keep linearly scaling all the way up.) Speculating wildly, I could probably have run some of our real clusters on the equivalent of a 8xlarge, but of course 32 core systems were not widely available at that time. The cluster I run today, even a hypothetical 48xlarge would struggle.

YMMV for non-managed stuff, but really, you can only bump cores like 3 times realistically, 4 if you started really shitty, before you start getting into special pricing brackets.

The_Colonel · on July 13, 2023

Increasing core count is not really vertical scaling. It's a hybrid between vertical and horizontal scaling, having some characteristics of both. It also tops out quite early (especially its cost-effectiveness for many use cases, but there's an absolute upper limit as well).

marginalia_nu · on July 13, 2023

You can go from a 8T/16C Epyc 7xxx series CPU to a 32T/64C CPU and not even double the cost.

fluoridation · on July 13, 2023

That's more like horizontal scaling, though. You get more throughput (transactions per second) but not lower latency (seconds per transaction). Though it may be more cost-effective to have a single 32-core machine than two 16-core machines.

marginalia_nu · on July 13, 2023

I disagree with this definition of horizontal scaling. If you're moving to a bigger computer rather than more computers, then you're scaling vertically and not horizontally.

(and fwiw, wikipedia agrees with this definition: https://en.wikipedia.org/wiki/Scalability#Horizontal_(scale_... )

fluoridation · on July 13, 2023

Then it sounds like you have a disagreement of terminology with FTA, since the article is using the terms like I am. Vertical scaling means increasing the serial performance of the system, and horizontal scaling means increasing the parallel performance of the system. In this sense, vertical scaling past a certain point does indeed get exponentially more expensive, while horizontal scaling almost always scales linearly in cost, or better.

dekhn · on July 13, 2023

The terms are used loosely and it doesn't make a lot of sense to argue about the definitions.

I think it's true to say that vertical scaling normally is done by increasing the RAM and CPU of a single machine with a single address space and switch/bus. While horizontal scaling is normally adding more machines (additional addresses spaces and switch/bus). Historically this is because RAM to CPU performance (throughput and latency) in a single address space and bus greatly exceeds the performance of any NIC connecting machines with distinct address spaces and busses. And it mostly ignores effects like the performance costs of swapping/paging when you don't have enough RAM.

I haven't really seen many systems where horizontal scaling is truly linear, unless the problem is embarassingly parallel, like serving static content.

fluoridation · on July 13, 2023

Note that I was referring to scaling of cost, not of performance. If your application parallelizes ideally, then in the worst case your cost will scale linearly, because you just add more machines and increase your power consumption by new_machine_count/previous_machine_count. It's possible adding more processors in the same address space increases the cost by an amount below new_core_count/previous_core_count, in which case the cost scales better than linearly.

marginalia_nu · on July 13, 2023

What I'm commenting on is this phrasing from the article

> Vertical scaling — a bigger, exponentially more expensive server

> Horizontal scaling — distribute the load over more servers

teawrecks · on July 13, 2023

Ok, I see where the lay person would get confused on this. In the context of this article, every core is what Wikipedia calls a "node". There is no difference between a single 32C CPU and 4x 8C CPUs except for their ability to share memory faster. Both are similarly defined as horizontal scaling in the context of this article. You're not going to finish a single workload any faster, but you're going to increase the throughput of finishing multiple workloads in parallel.

The fact that AMD chooses to package the "nodes" together on one die vs multiple doesn't change that.

suremarc · on July 13, 2023

The ability to “share memory faster” is a bigger distinction than you make it out to be. Distributed applications look quite different from merely multithreaded or multiprocess shared-memory applications due to the unreliability of the network, the increased latency, and other things which some refer to as the fallacies of distributed computing. To me, this is usually what people mean when they talk about “horizontal” vs. “vertical” scaling. With modern language-level support for concurrency, it hurts much more to go from a shared memory architecture to a distributed one than to go from a single-thread architecture to a multithreaded one.

marginalia_nu · on July 13, 2023

The wikipedia article qualifies what it means with vertical scaling

> typically involving the addition of CPUs, memory or storage to a single computer.

teawrecks · on July 13, 2023

This is one of those times when I feel like you just didn't read anything I typed. So... I'm just gonna let you be confidently incorrect.

marginalia_nu · on July 14, 2023

I'm reading what you're typing, but I just don't agree with it. It's also contradicted by both the article we're discussing and the wikipedia article; further it's an interpretation of vertical scaling that effectively doesn't

Distinction between horizontal and vertical scaling becomes nonsense if we accept your definitions, because literally nobody does that sort of vertical scaling.

fluoridation · on July 14, 2023

Wrong. If you do any of these you're scaling vertically, even by that definition:

* Replace the CPU with a faster one, but with the same number of cores. Or simply run the same one at a higher clock rate.

* Add memory, or use faster memory.

* Add storage, or use faster storage.

These are all forms of vertical scaling because they reduce the time it takes to process a single transaction, either by reducing waits or by increasing computation speed.

> It's also contradicted by both the article we're discussing and the wikipedia article

The article agrees with this definition. Transaction latency decreases iff vertical scale increases. Transaction throughput increases with either form of scaling. Without this interpretation, the analogy to conveyor belts makes no sense.

suremarc · on July 14, 2023

Think of it this way, instead. Building a multi-belt system is a pain in the ass that complicates the design of your factory. Conveyor belt highways, multiplexers, tunnels, and a bunch of stuff related to the physical routing of your belts suddenly becomes relevant. But you can still increase throughput keeping a single belt, if your bottleneck is not belt speed but processing speed (in the industrial sense). I can have several factories sharing the same belt, which increases throughput but not latency.

Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. In Factorio you need 3 advanced circuits for the chemical science pack. If your science lab can produce 1 science pack every 24 seconds but your pipeline takes 16 seconds to produce one advanced circuit, your whole pipeline is going to have a latency of 48 seconds from start to finish due to being bottlenecked by the advanced circuit pipeline. Doubling the amount of processing units in each step of the circuit pipeline will double your throughput and bring your latency down to 24 seconds, as it should be. And if you have room for those extra processing units, you can do that without adding more belts.

The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

fluoridation · on July 14, 2023

>Also, it's worth pointing out that increasing the number of processing units often _does_ decrease latency. [...]

This isn't latency in the same sense I was using the word. This is reciprocal throughput. Latency, as I was using the word, is the time it takes for an object to completely pass through a system; more generally, it's the delay between a cause and its effect/s. For example, you could measure how long it takes for an iron ore to be integrated into a final product at the end of the pipeline. This measure could be relevant in certain circumstances. If you needed to control throughput by restricting inputs, the latency would tell you how much lag there is between the time when you throttle the input supply and the time when the output rate starts to decrease.

>The idea that serial speed is equivalent to latency breaks down when you consider what your computer's hardware is really doing under the scenes, too. Your cpu is constantly doing all manner of things in parallel: prefetching data from memory, reordering instructions and running them in parallel, speculatively executing branches, ...et cetera. None of these things decrease the fundamental latency of reading a single byte from memory with a cold cache, but it doesn't really matter because at the end of the day we're measuring some application-specific metric like transaction latency.

Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level. You take a unit of work that's both atomic (it's either complete or incomplete) and serial (a thread can't do anything else until it's completed it), and take a timestamp when it's begun processing and another when it's finished. The difference between the two is the latency of the system.

suremarc · on July 14, 2023

> This isn't latency in the same sense I was using the word.

But it is. If you add more factories for producing advanced chips, you can produce a chemical science pack from start to finish in 24 seconds (assuming producing an advanced circuit takes 16 seconds). Otherwise it takes 48 seconds, because you’re waiting sequentially for 3 advanced circuits to be completed. It doesn’t matter that the latency of producing an advanced circuit didn’t decrease. The relevant metric is the latency to produce a chemical science pack, which _did_ decrease, by fanning out the production of a sub-component.

Edit: actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.

> Yes, a CPU core is able to break instructions down into micro-operations and parallelize and reorder those micro-operations, such that instructions are retired in a non-linear manner. Which is why you don't measure latency at the instruction level.

That’s what I’m saying about Factorio, though. You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.

fluoridation · on July 14, 2023

>actually, my numbers are measuring reciprocal throughput, but the statement still holds true when talking about latency. You can expect to complete a science pack in 72 seconds (24+16*3) with no parallelism, and 40 seconds (24+16) with.

That's still reciprocal throughput. 1/(science packs/second). You're measuring the time delta delta between the production of two consecutive science packs, but this measurement implicitly hides all the work the rest of the factory did in parallel. If the factory is completely inactive, how soon can it produce a single science pack? That time is the latency.

>You can measure latency for individual components, and you can measure latency for a whole pipeline. Adding parallelism can decrease latency for a pipeline, even though it didn’t decrease latency for a single component. That’s why the idea that serial performance = latency breaks down.

Suppose instead of producing science packs, your factory produces colored cars. A customer can come along, press a button to request a car of a given color, and the factory gives it to them after a certain time. You want to answer customer requests as quickly as possible, so you always have ready black, white, and red cars, which are 99% of the requests, and your factory continuously produces cars in a red-green-blue pattern, at a rate 1 car per hour. Unfortunately your manufacturing process is such that the color must be set very easy in the pipeline and this changes the entire rest of the production sequence. If a customer comes along and presses a button, how long do they need to wait until they can get their car? That measurement is the latency of the system.

The best case is when there's a car already ready, so the minimum latency is 0 seconds. If two customers request the same color one after the other, the second one may need to wait up to three hours for the pipeline to complete a three-color cycle. But what if a customer wants a blue car? I've only been talking about throughput. Nothing of what I've said so far tells you how deep the pipeline is. It's entirely possible that even though your factory produces a red car every three hours, producing a blue car takes you three months. If you add an exact copy of the factory you can produce two red cards every three hours, but producing a single blue car still takes three months.

Adding parallelism can only affect the optimistic paths through a system, but it has no effect on the maximum latency. The only way to reduce maximum latency is to move more quickly through the pipeline (faster processor) or to shorten the pipeline (algorithmic optimization). You can't have a baby in one month by impregnating nine women.

suremarc · on July 14, 2023

> If the factory is completely inactive, how soon can it produce a single science pack? That time is the latency.

That is what I am trying to explain, now for the second time.

Let's say you have a magic factory that turns rocks into advanced circuits, for simplicity's sake, after 16 seconds. You need 3 advanced circuits for one chemical science pack. If you only have one circuit factory, you need to wait 3 * 16 seconds to produce three circuits. If you have three circuit factories that can grab from the conveyor belt at the same time, they can start work at the same time. Then the amount of time it takes to produce 3 advanced circuits, starting with all three factories completely inactive, is 16 seconds, assuming you have 3 rocks ready for consumption.

The time it takes to produce a chemical pack, in turn, is the time it takes to produce 3 circuits, plus 24 seconds to turn the finished circuits into a science pack. It stands to reason that if you can produce 3 circuits faster in parallel than sequentially, you can also produce chemical science packs faster sequentially.

> Suppose instead of producing science packs, your factory produces colored cars. A customer can come along, press a button to request a car of a given color, and the factory gives it to them after a certain time. You want to answer customer requests as quickly as possible, so you always have ready black, white, and red cars, which are 99% of the requests, and your factory continuously produces cars in a red-green-blue pattern, at a rate 1 car per hour. Unfortunately your manufacturing process is such that the color must be set very easy in the pipeline and this changes the entire rest of the production sequence. If a customer comes along and presses a button, how long do they need to wait until they can get their car? That measurement is the latency of the system.

Again, I understand this, so let me phrase it in a way that fits in your analogy. Creating a car is a complex operation that requires many different pieces to be created. I'm not a car mechanic, so I'm just guessing, but at a minimum you have the chassis, engine, tires, and the panels.

If you can manufacture the chassis, engine, tires, and panels simultaneously, it will decrease the total latency of producing one unit (a car). I'm not talking about producing different cars in parallel. Of course that won't decrease latency to produce a single car. I'm saying you parallelize the components of the car. The time it takes to produce the car, assuming every component can be manufactured independently, is the maximum amount of time it takes across the components, plus the time it takes to assemble them once they've been completed. So if the engine takes the longest, you can produce a car in the amount of time it takes to produce a engine, plus some constant.

Before, the amount of time is chassis + engine + tires + panels + assembly. Now, the time is engine + assembly, because the chassis, tires, and panels are already done by the time the engine is ready.

Dylan16807 · on July 14, 2023

Or other people can disagree with your interpretation, especially because the analogy is somewhat strained and highly oversimplified.

teawrecks · on July 13, 2023

The article defines vertical scaling as using faster conveyer belts (serial performance) and horizontal scaling as using more conveyer belts (parallel performance).

So your example of adding more CPU cores would be horizontal scaling, while using a faster core would be vertical. Vertical scaling has diminishing returns.

KRAKRISMOTT · on July 13, 2023

Also beyond a certain point, it makes sense to go straight to dedicated bare metal. The AWS tax is not worth paying if your workload is mostly fixed, somewhat fault tolerant (i.e. failed hardware on the weekends can be replaced on Monday without major interruption to business operations), and CPU bound. Get a high end machine on Hetzner and put everything behind a VPN or API auth and you will save more than 50% in spending.

RhodesianHunter · on July 13, 2023

I haven't found this to be true generally unless your workloads are truly completely static, which I've never actually experienced.

Given what engineers at this level cost, their costs per hour dealing with all of the nonsense clouds handle for you (networking, storage, elastic scaling, instant replacement of faulty servers, load balancing, yadda yadda) end up being higher than whatever tax you're paying for using the cloud.

Economies of scale are real.

dang · on July 13, 2023

Understanding Kafka with Factorio - https://news.ycombinator.com/item?id=20362179 - July 2019 (84 comments)

(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)

AbraKdabra · on July 13, 2023

It's always good to se Cracktorio on the frontpage of HN, hopefully someone will make a similar article showing the similarities between Factorio and drugs.

mistermann · on July 13, 2023

Or human culture as a video game in general, and humans as semi-rational, semi-aware characters in the video game, except some of the characters are special in that their job is to deceive and exploit other characters, including constructing illusions indistinguishable from reality of who is good and who is bad, what is true and what is not, what we should do and should not do, etc. And to make it all even more exciting, not all of the illusionists are aware of their actual, often hybrid role in the big scheme of things.

Maybe Shakespeare or some famous philosophers would have seen this angle, were video games to exist in their era. Unfortunate timing I guess.

morelisp · on July 13, 2023

Cute, but over years of explaining it I think any explanation of Kafka that presents it as a queue is bound to leave the reader with more misaligned expectations than when they started (while also making them think they learned something, which can be even more dangerous). To keep the Factorio-esque framing, move the consumers, not the messages.

kentm · on July 13, 2023

Agreed. There's an important difference between things like Kafka & Kinesis vs RabbitMQ & SQS. The latter are conceptually queues, and the former are conceptually logs. Logs and queues can both be used in many of the same use cases, but it's important to understand how they are different.

Terr_ · on July 13, 2023

i.e.: Items are frequently removed from queues which have a shared "next item", while logs usually just get longer and each consumer is responsible for keeping track of their own progress or positions.

It's harder to think of factory-game analogies for logs, since they involve copying without altering the original sequence. It would have to involve some kind of moving non-destructive sensor or object-cloner mechanic.

lakomen · on July 13, 2023

Idk... maybe it's because I'm self taught and have been coding since the age of 11, but I don't find the indirect approach helpful, the opposite.

I believe that's why OO is so popular, people who only know the object way of thinking, who have difficulties with the virtual and abstract like OO and condemn the pragmatic approach.

aledalgrande · on July 14, 2023

> If you don’t have a lot of time to spare, don’t download Factorio.

Wish I read this months ago. Factorio sucked my life in until I managed to launch that damn ship

henrydark · on July 14, 2023

Who else was disappointed to find it's Apache Kafka and not Franz Kafka?

not_your_mentat · on July 13, 2023

And now I think of everything I do in Factorio models.

jrm4 · on July 13, 2023

I was 100% expecting something about the writer. :)

golergka · on July 13, 2023

One morning, when Gregor Samsa woke from troubled dreams, he found himself transformed into a piece of iron ore on a moving conveyor belt.

tobbe2064 · on July 13, 2023

That one mad me laugh so much i annoyed my wife

lukasLansky · on July 13, 2023

Both Franz Kafka and Factorio come from Prague after all.

geodel · on July 13, 2023

Bad software is way more popular than a good writer.

jrm4 · on July 14, 2023

Why are you downvoting him, hes right? :)

mavu · on July 13, 2023

I clicked this and was immeasurably disapointed that the article talks about Apache Kafka, and not the author Kafka and how to understand his work with Factorio.

That would be a much much better article.

politician · on July 13, 2023

[flagged]

RhodesianHunter · on July 13, 2023

I'm not sure how to feel about the fact that we're starting to see ChatGPT responses to questions like this in forums.

mavu · on July 14, 2023

I'm willing to make an exception in this case, because using LLMs for nonsense is exactly the usecase they are built for.

lionkor · on July 14, 2023

Yes, but nonesense has no place here. There are other places to shitpost, namely this would be perfectly placed on the r/copypasta subreddit (NSFW, navigate with caution)

politician · on July 14, 2023

It's getting too troublesome to post to HN. Three levels deep of self-affirming comments about why the original post should be censored. Cool.

lionkor · on July 14, 2023

[removed for opinion]

alexleeds · on July 14, 2023

OK. 3rd repost of the same material. Really?

keithnz · on July 14, 2023

reposts are completely acceptable here.

dang · on July 15, 2023

After a year or so: https://news.ycombinator.com/newsfaq.html

BSEdlMMldESB · on July 13, 2023

what about understanding our own imperialistic civilization?

crash in planet (or new continent?) and proceed to expand while decimating native life?

in any case, good game.

golergka · on July 13, 2023

That's what all life forms do — consume everything they can and reproduce as much as they can. So, doing is _is_ life.

hutzlibu · on July 13, 2023

That's an oversimplification. If you look closer, you might find that most living things are mostly living in symbiosis with their surroundings. Your human body is an example of cooperation of simple life forms. And when those living cells of your body consume everthing they can and reproduce as much as they can - than this is called cancer.

Terr_ · on July 13, 2023

Survivorship bias: Mutual benign coexistence and symbiosis within the present are rooted in massacres of the past.

golergka · on July 13, 2023

Those statements do not contradict each other. Life forms consume and reproduce as much as they can, and they also live in cooperation and symbiosis. The latter can often be the most effective strategy to consume and reproduce as much as one wants.

hutzlibu · on July 13, 2023

"That's what all life forms do — consume everything they can and reproduce as much as they can"

Well, if you take this statement literal, then I think they do contradict each other - see my cancer example. But in the way you apparently meant it, then no, there is no contradiction.

BSEdlMMldESB · on July 13, 2023

yes, agree

the issue then, is how to decide what "as much as we can" even means; if we do "as much as we can" but this kills us, then it will have turned out that we indeed could not do that much

Dylan16807 · on July 14, 2023

The native life already has massive coverage and is expanding into the remaining space.

In a typical game the player doesn't decimate them, if anything they make the native life flourish.