Our prod cluster generates that about every minute at O(1M) qps. We JUST turned ...

inkyoto · on Sept 20, 2023

TIBCO Rendezvous is tech from 1998/99, millions of messages per second didn't exist at the time. Only NYSE and NASDAQ were capable of producing millions of events back then (still not by minute nor by second).

TIBCO Rendezvous was one of the first successful large scale, low latency and near real-time pub/sub implementations, and it had a very efficient, Avro like, wire level serialisation format that made messages very compact and efficient to deliver. It was very popular in finance, banking and manufacturing, and is all but legacy now.

kuchenbecker · on Sept 22, 2023

$$$ not capability. We have ~50 hosts that generate up to TB per day in just logs and 50k hosts that generate O(200mb/day). The large hosts ssh and grep works surprisingly well, but the smaller hosts is the real benefit.

Hard to justify a team of 7 burning several million in just Logging infra costs.

blindriver · on Sept 20, 2023

My previous company's Kafka cluster was handing 20 million messages per second 5 years ago, and dozens of petabytes of data per day. Maybe your particular cluster that didn't have the capacity to handle 1M qps, but Kafka easily had that capacity years ago.

_a_a_a_ · on Sept 20, 2023

I have to ask, what value is this adding business-wise to store so much?

blindriver · on Sept 21, 2023

Kafka when used correctly is the like the nervous system for your entire company. You use it like a message bus and dump every single thing into it, and you can extract it out at your leisure, but mostly real-time. It completely transforms how you do services in a company, but it also means you have to invest a lot of money and manpower into maintaining it because it is mission critical.

sharms · on Sept 20, 2023

Not OP but I think it isn't always about storing, but having a log of events which get routed, processed, and aggregated in many cases.

kuchenbecker · on Sept 22, 2023

It was quota and hardware, not ability. This is a single service onboarding and they need the hardware.

And at that scale, we need to grep the logs so the downstream need the ability to process that volume, which it couldn't until about 2 years ago.