Hacker News new | past | comments | ask | show | jobs | submit | chupy's favorites login

> I don't see why you need exactly-once write semantics.

World leaders use Twitter. It's a major international one-to-many communication platform. If tweets are lost or duplicated, it makes the platform look unreliable (because it literally would be) and as well as potentially making the tweeter look incompetent for posting twice. World leaders don't like to look incompetent, that can cause really bad things to happen...

> @BBCWorld is #50 and it drops to 38m accounts. #1000 has 2 million followers.

Even a write amplification factor of 100,000 is extremely problematic for the fully-materialized inbox model. A lot of prominent twitter users have followings larger than that.

> To the people offline you don't need to fan out in a timely manner.

So now you're adding additional systems on top, in order to scale. That's good, I guess you're starting to see that the problem is more complex than just spraying out every tweet to every follower's inbox. Now consider that when you actually build and scale a system like this, you'll need to keep doing that in a bunch of different areas, and the complexity keeps snowballing.

> And beefy enough hardware can handle those peaks.

There's no way to fit every users' fully-materialized inbox feed on one machine, so we're definitely talking about a large distributed storage tier / database here. Will you use "beefy" hardware for every single shard of your inbox storage tier?

> It is very unlikely that even a quarter of all his followers are using the app during that exact minute, so we're talking 30m writes in 60 seconds. Big whoop.

Once again, this really isn't like doing 30m write ops on a single box. It's queueing the writes via RPCs across a huge storage tier, while also needing some way to handle timeouts, retries, failovers on either side of the operation. All while the "normal" background level of thousands of tweets per second is happening from everyone else.

> An appeal to authority isn't an impressive argument, I am also from this industry and with similar experience. > There is no need to take things personally.

I've literally built a reverse-chronological social network activity feed implementation, which successfully scaled to over 110 million posts/day. (For sake of comparison, Twitter was around 500 million tweets/day at that time, so this was def smaller than Twitter, but still quite large.) It did not use an inbox model. Took many months of my life, some of the most rigorous work I've ever done. My teammates and I evaluated several alternative designs, including fully-materialized inbox, running all the numbers in depth and building several prototypes. The takeaway was that a naive fully-materialized inbox would be completely and ludicrously infeasible in terms of necessary hardware footprint.

Separately, I've also spent years working on database infrastructure at extreme scale, including one of the largest relational database footprints on earth. I have a very good sense of what this requires. Yes, I'm posting "opinions", but they are based on many years of direct personal expertise.

Scaling a social network involves a massive number of challenging problems. Faster hardware doesn't magically make these problems go away. And while I haven't worked at Twitter, up until this month I knew four infra/backend engineers working there, and they're some of the best engineers I've ever known in my 17 year career.

I'm taking your comments personally because your comments are offensive. You're blindly saying I need to "refresh [my] assumptions" about a topic I'm literally an expert in. You're claiming Twitter could use some completely asinine overly-simplistic feed model, as if no one else ever thought of that, which would strongly imply every infra engineer at Twitter must be an idiot. In another subthread on this page, you wrote "The job cuts are clearly justified because of the extremely toxic work culture / cult" and it is necessary to "replace every single person who worked there and the entire tech stack". Seriously, WTF? These are hard-working humans with lives and families, they don't deserve this shit from their employer, and certainly not from offensive pseudonymous randos who have no idea what they're talking about. Have some empathy.


Honestly, we originally did microservices because it sounded like a fun idea and because it would look really cool in our marketing materials. At the time, this was a very shiny new word that even our non-tech customers were dazzled by.

As oxidation and reality set in, we realized the shiny thing was actually a horrific distraction from our underlying business needs. We lost 2 important customers because we were playing type checking games across JSON wire protocols instead of doing actual work. Why spend all that money for an expensive workstation if you are going to do all the basic bullshit in your own brain?

We are now back into a monolithic software stack. We also use a monorepo, which is an obvious pairing with this grain. Some days we joke as a team about the days where we'd have to go check for issues or API contract mismatches on 9+ repositories. Now, when someone says "Issue/PR #12842" or provides a commit hash, we know precisely what that means and where to go to deal with it.

Monolithic software is better in literally every way if you can figure out how to work together as a team on a shared codebase. Absolutely no software product should start as a distributed cloud special. You wait until it becomes essential to the business and even then, only consider it with intense disdain as a technology expert.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: