Saying it's online is misleading. It's just a static landing page.

rodgerd · on Jan 18, 2021

Yeah, https://twitter.com/th3j35t3r/status/1350612426115452935 shows the hardware they've been asking vendors to provide.

I've never run a high-volume social media site, but I have to say that 70-100 DB servers each with 4 TB of storage and 96 GB feels like a bit much.

(On edit: Corey Quinn estimates it as half a million a month on AWS.)

TulliusCicero · on Jan 18, 2021

Is needing servers with 96 vCPU's and 768 GB of RAM each...normal?

I'm a programmer, but I do mobile dev, and that amount of resources for a single machine sounds comically high to me. I thought modern websites usually had more 'normally' specced servers and just distributed the load across a ton of them?

sgk284 · on Jan 18, 2021

These requirements are insane. By comparison, Stack Overflow had 25 servers in 2016 [1]:

    - 2 database servers (with half that RAM and 24 cores each)
    - 2 redis servers
    - 3 elasticsearch servers
    - 11 web servers
    - 4 load balancers
    - 3 service workers

[1]https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

AareyBaba · on Jan 18, 2021

Can anyone recommend a Coursera/EdX or other online resource that goes over the basics of designing and setting up such kinds of high-performance server systems ?

zamalek · on Jan 18, 2021

I honestly don't think you'll find a course that covers this. The SO devs (and others) have kept a blog going for years with the type of information you're after. The search term is "mechanical sympathy."

Language/framework also matters. You're not going to pull this off with a backend based on a framework that gives no damns about performance. Rust, C# (bleeding edge), Java and C come to mind as good candidates.

kgantchev · on Jan 18, 2021

SO doesn't have the content requirements of a modern Social Network: images, video, and streaming are huge factors. Especially with modern smartphone capabilities.

jrockway · on Jan 18, 2021

I wrote a program that ran on that type of computer once. It was real-time log analysis. There was a lot of data coming in, and there was a need for a globally-consistent datastore for the results. Solution? Put it all in RAM. If the single replica crashes, you can just read all the inputs again. Some extra capacity existed for that, and some other system already stored the raw logs to disk durably. The terabyte of in-memory data just made queries tolerable enough to run every few seconds and display in alert/chart form.

I didn't keep the system in this state for very long, but for version 1.0 it was just the thing -- idea to production in a short period of time. Eventually it did move to a more distributed system, as log volume (and usefulness) increased, and we had more time to deal with the details. It was mostly nice to not have to reprocess data after releases -- I could do them in the middle of the day without anyone caring.

My biggest worry when writing this was that 40Gbps of network bandwidth wouldn't be enough, but it was fine in the early stages. 40Gbps is a lot of data.

I'm not sure I'd say it's a great sign that you need a single beefy machine to run something, but it's a tradeoff worth considering. I found the distributed system version easier to operate, but it did constrain what sort of features you could add. I think we got it right, but it's easy to code yourself into a corner when you have encoded assumptions deep into the system. Best to avoid that until you're sure your assumptions are right.

Avery wrote up some details of the system: https://apenwarr.ca/log/20190216 His idea, but I wrote most of the code ;)

ncallaway · on Jan 18, 2021

To be clear (if the image is accurate), they weren't asking for one beefy DB instance with 96 vCPUs and 768GB RAM. They were asking for 70-100 instances with those requirements.

elliekelly · on Jan 18, 2021

I remember reading a few days ago that their code was atrocious. Perhaps this is an extreme example of the costs of never refactoring? I can believe that the cost of spaghetti code compounds quickly.

Exmoor · on Jan 18, 2021

I can believe that the company who named resources sequentially, enforced no security validation for viewing posts/media, and didn't strip metadata from media uploads also didn't have engineers especially skilled in optimization.

the8472 · on Jan 18, 2021

I agree that the other points indicate bad engineering. But not stripping metadata can be a intentional. E.g. some smaller image hosters do that to preserve files bit-identically. Some forums with heavy emphasis on minimal moderation take the position that opsec is the poster's responsibility.

toast0 · on Jan 18, 2021

I don't know what 96 vCPUs means in terms of real cores, and my server knowledge is a bit old at this point, but here goes.

There's a benefit to running on fewer servers. If you can make good use of many core machines and gobs of ram, it makes sense to go up to at least reasonably large machines. For Intel, dual socket Xeon is widely available and not obscenely priced; for AMD, I haven't seen a lot of dual socket Epyc, but 64 cores in a single socket is quite a lot. 768 GB seems big, but if you can put it into one machine instead of 12 machines with 64 GB, that helps reduce maintenance and communications overhead.

I ran systems with dual Intel Xeon 2690v4, a total of 28 cores/56 threads, and we put up to 768GB in some of them; that was several years ago, you can get a lot bigger now.

Databases love ram, and social sites make a lot of queries, so it makes sense a bit. I don't know what their usage numbers are, or what their site looks like; I'm just guessing based on general description. The traffic numbers didn't look too big, but types of request makes a big difference there; serving media is relatively easy, serving comments threads and highlighting your friends is trickier.

(Serving media with transcoding is a lot less easy though)

sanity · on Jan 18, 2021

Sounds like it was written in Ruby.

vxNsr · on Jan 21, 2021

Hey,

Your website's portal is down, I tried tweeting and email but I'm not getting a response. Just wanna make sure everything is ok

awwaiid · on Jan 18, 2021

I wonder if twitter used so much hardware when it was on rails.

Nextgrid · on Jan 18, 2021

This seems extreme especially if they're going for bare-metal. Even a single one of these DB servers would handle a lot of traffic if they're bare-metal.

When it comes to running Postgres on 70-100 servers I'm also not sure, unless they're doing some sharding at the application level, I'd expect the overheads of replication and resulting network traffic to be insane if they're merely replicating across all of them.

Their whole website should be able to run on a handful of these machines; their main cost and resource usage would be hosting & converting uploaded media, not the DB of app servers.

Corrado · on Jan 18, 2021

Maybe they are trying to bring up multiple sites (ie. AWS regions) for redundancy. If you have 10 "regions" then maybe hardware requirements look a bit more reasonable (7-10 DB servers each, etc.)

histriosum · on Jan 18, 2021

Holy wow..

To me, this looks like the contents of an RFP where the submitter doesn't actually want any vendor to be able to meet the requirements.

MattGaiser · on Jan 18, 2021

I wonder if it is for the anti-trust lawsuit. Claim that nobody else can provide what they need?

jcranmer · on Jan 18, 2021

It wouldn't matter for the anti-trust lawsuit, since they're alleging that Amazon conspired with Twitter to kick off Parler because Parler would steal Twitter's market share. (Well, trying to allege, because they don't even manage to allege that any further than "Amazon hosts Twitter" and implying there's no other reason Amazon would kick Parler off, despite spending half the brief complaining that they're being kicked off for being conservative.)

skeeter2020 · on Jan 18, 2021

This seems like a good guess

systematical · on Jan 18, 2021

But wouldn't that be easily thrown out / defeated if they weren't consuming those resources on AWS? Which seems easy to prove in court...

MattGaiser · on Jan 18, 2021

That didn't stop any of the election lawsuits.

edoceo · on Jan 18, 2021

Yes it did, they stopped. They started, were clearly bullshit, and then died in the courts - exactly as the system intended.

Spivak · on Jan 18, 2021

This is an absolutely bonkers amount of resources for the functionality and audience they have. We got our first 5 million users on 4 used PowerEdge nodes and our first 30 million with 2 bare-metal DB servers.

Good god what is Parler doing?

fma · on Jan 18, 2021

Seeing how their software is very far from being secure...it's probably very far from optimized, too.

howlgarnish · on Jan 18, 2021

They're running Wordpress, which is famous for scaling badly to the extent that there are large companies like WPEngine.com devoted to doing nothing except working around that.

Hamuko · on Jan 18, 2021

How and why would you run a social network in Wordpress?

cbg0 · on Jan 18, 2021

A lot of people like the back-end admin panel of WordPress and its WYSIWYG feature, so a bunch of other people have created a lot of themes/mods/plugins around WP because of that.

If you run a small website it's all fine, otherwise you have to deal with performance issues and WP limitations because you've decided to build a social network on top of a blogging engine.

tiborsaas · on Jan 18, 2021

I'd guess that they hired any programmer they could find to do some PoC work (who made a horrible choice) then the rest is sunken cost fallacy.

logimame · on Jan 18, 2021

The reason is that they're running their social media site using Wordpress (of all things, why?!?!?!), so it's probable that it will scale horrendously. The data breach happend with Parler was allegedly due to an exploit in one of the Wordpress plugins.

freedomben · on Jan 18, 2021

I'm guessing these aren't their current needs, but rather where they anticipate growing too.

These numbers are insane, but the CEO of Parler is a former AWS employee, so I would guess he knows somewhat what he is doing.

albatruss · on Jan 18, 2021

Former AWS employee after being employed for <3 months.

Barrin92 · on Jan 18, 2021

Seems like a lot. I remember Stackoverflow posting their hardware setup in 2016 IIRC and it was just a handful of servers compared to this.

rodgerd · on Jan 18, 2021

Quite: https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...

devwastaken · on Jan 18, 2021

They're probably intending on selling their own services as an AWS replacement for people whom are 'censored'.

ketamine__ · on Jan 18, 2021

They erroneously hosted their free speech site on AWS. Smart people are not employed by the company.

dang · on Jan 18, 2021

Ok, we've replaced website with home page in the title above.