Rearchitecting Core Services at X

eska · 2024-09-17T12:54:56 1726577696

Pretty light on technical specifics.

What I find odd about all of this is that while they (finally) split reads from writes, there’s no distinction between hot and cold data. I assume 80% of all tweets are only read 20% of the time, if that. This would heavily influence my architecture regarding caching for example.

RPC is also mentioned, leading to “pull” architecture, when I would assume most of the work to be a “push” pipeline, and work to be embarrassingly parallel. So there’s no real software complexity from interconnected data, just a large amount of it but it’s easily sharded and solved by good hardware? So in summary it doesn’t actually sound interesting from an algorithm perspective. What services are they talking about for RPC, analytics? Do those really need to be so dynamic?

jiggawatts · 2024-09-16T21:37:54 1726522674

Why do I get the impression that all of these FAANG-like companies spend billions of dollars of engineering reinventing partitioned databases, but with HTTP as the query protocol?

They talk about “granular field-level access” — that’s just a SELECT statement. They talk about separating the read only workloads — that’s read scale out replicas. They talk about Arrow and similar technologies that sound an awful lot like SQL queries.

vinkelhake · 2024-09-16T22:02:55 1726524175

It could be that they're plowing billions of dollars into engineering and missing some obvious solution.

I think it's more likely that once you get down the details (which are going to much more intricate than a post like this can convey), then it may turn out to actually be a hard problem that requires a bunch of engineering.

For a relevant example of the "how hard could it be"-mindset: See George Hotz's short stint at Twitter.

mvdtnz · 2024-09-16T23:43:24 1726530204

> See George Hotz's short stint at Twitter

Do you have any more information about this, or could provide a link?

mplewis · 2024-09-16T23:50:39 1726530639

Infamous iPhone hacker geohotz joined Twitter for a three-month internship where he planned to fix search. He quit after one month having accomplished nearly nothing.

jazzyjackson · 2024-09-17T04:05:30 1726545930

Yeah it may be a case of tucking tail, but that says nothing about whether it was a difficult engineering problem, could just as well be an impossible political situation to navigate (whose feelings are you going to hurt by deleting some microservice)

darth_avocado · 2024-09-16T22:33:26 1726526006

There's a lot of finer details that influence how you design system. Yeah reading data is just SELECT, but there's too many things to consider.

- Volume of reads: how many people are reading tweets? - Volume of tweets: how many unique tweets are being served to people? - Data skew: Elon's tweets see hundreds of millions of reads, vs some tweets only see like 10. - Size of data: A tweet looks like only a few characters of data, but every tweet has probably hundreds of columns of metadata.

These are just a few considerations. Plus, scaling out replicas isn't always the feasible after a point. Otherwise, why not just spin up 1 server per tweet and keep things simple.

wmf · 2024-09-16T22:33:10 1726525990

Doing 60 million queries per second on a partitioned database isn't going to be easy either, and when you have problems you'll have to dig into the code of a database you didn't develop.

Demiurge · 2024-09-16T23:18:15 1726528695

Yeah, is there a PostgreSQL deployment that does 60m qps?

jiggawatts · 2024-09-17T06:42:19 1726555339

Does Twitter actually need 60 qps, or is this just a consequence of not having JOIN operators?

joshuamorton · 2024-09-16T21:55:31 1726523731

Because you have a hammer and so everything sounds like a nail.

When you discuss a problem at a high enough level of abstraction, then yes everything looks like a database, and at a high enough level of traffic all databases need to be partitioned/replicated.

The much more interesting question would be "why doesn't the thing I'm familiar with solve this problem"?

michaelt · 2024-09-16T23:21:04 1726528864

Ah, but what about the tracking for ad targeting?

That tweet viewed by a million people isn't just a million tweet database reads, it's also a million tracking database writes.

Wololooo · 2024-09-16T23:28:48 1726529328

Actually, that's not a problem anymore...

cryptonector · 2024-09-17T00:14:01 1726532041

I interviewed at a ride-share company (one that was not then working on self-driving cars mind you) once some 5 years ago or so and was shocked to find out that they had 1,000 software engineers. What did they need so many engineers for?! But then all their queries were hand-written to some key/value store w/o a query engine -- that has to explode your engineering needs a bit, though maybe still not to 1,000.

kgeist · 2024-09-16T23:02:14 1726527734

Reminds of this: https://news.ycombinator.com/item?id=17135841

marcinzm · 2024-09-16T23:06:09 1726527969

That proves the opposite point. At small scale FAANG solutions are non-sensical. At 100 petabytes good luck running your code on a laptop.

kgeist · 2024-09-16T23:52:07 1726530727

On the other hand, we often create over-engineered solutions because taking the simpler route would require us to rewrite existing cruft, which is often just more expensive.

CSMastermind · 2024-09-16T21:59:12 1726523952

My hot take is that 90% of software engineering is just transforming data from one shape to another and there's only so many way to do that.

It doesn't matter if you're syncing between a database and server, between a presentation layer and an object model, or across two devices over a network.

Fundamentally you're dealing with an impedance mismatch that only has so many types of solutions.

amw-zero · 2024-09-17T00:12:35 1726531955

That’s not a hot take. Literally thousands of people have written and said this exact same thing, for years.

eightysixfour · 2024-09-16T22:55:18 1726527318

I don’t think that’s a hot take at all, I’m pretty sure that was lesson one in the data structures course at my school.

whalesalad · 2024-09-17T03:32:10 1726543930

Distributed systems is the answer. Data isn’t in one sql database.

numpad0 · 2024-09-16T23:01:34 1726527694

It's how they get paid. Replacing bloats with extremely simplistic equivalents like stock MySQL instance isn't going to convert well to paychecks. It also means your output will be erratic(because it'll be opportunistic), architecture design will be less resilient, and you will be a brilliant jerk that are hard to anticipate.

Elon Musk was supposed to be an uber-genious charisma who've overcome that problem. I suppose he no longer is.

jiggawatts · 2024-09-17T06:45:49 1726555549

Musk fired thousands of Twitter employees and the system is still running. It’s even getting significant core feature upgrades! Clearly, most of those engineers were superfluous.

I’ve thought Twitter was over-engineered years before that event and feel a tiny bit vindicated in holding that opinion against the HN gestalt.

preommr · 2024-09-16T23:00:46 1726527646

Because they make billions, and they have to spend it on software development.

That these software devs then go out and build slightly different offshoots because of business requirements and technical requirements is, in many ways, incidental. Devs at smaller companies end up writing poor-man's equivalent of various techs all the time, or write interesting things in their spare time. It just doesn't get the same attention because they're not as big.

blipvert · 2024-09-16T22:25:15 1726525515

Is this why I see the “Something went wrong - Retry” dialogue about fifty times a day now?

kyrra · 2024-09-16T23:02:50 1726527770

My take on this is that the client front end UI will load basic pages, but when it tries to load the feed there's somewhat Network error that causes the feed not to load.

aeturnum · 2024-09-16T23:09:49 1726528189

There is a curious lack of data showing the impacts of these changes (though perhaps that's forthcoming as they roll it out at scale).

_ugfj · 2024-09-17T01:07:00 1726535220

It should be obvious no one good stayed there. Either they were fired or left disgusted. You need to be really desperate to stick with Elmo.

AI_beffr · 2024-09-16T21:22:08 1726521728