This is quite valuable advise: "The original version of Discord was built in just under two months in early 2015. Arguably, one of the best databases for iterating quickly is MongoDB. Everything on Discord was stored in a single MongoDB replica set and this was intentional, but we also planned everything for easy migration to a new database"
Also the article links to Twitter blog, which gives similar point (it's from 2010): "We [Twitter] currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters" [1]
Modern Reddit for me runs like ass and I have a 2021 4.2GHz 8 core Ryzen CPU. I can't imagine what it's like on older systems where even my work Intel 9th gen was braking the site.
It’s incredible how slow their website is. I understand (but am against) their motivation to push everyone to the app on mobile devices, but on desktop where am I going to go? My powerful computers both struggle to render the site.
Seems like sound advice? Starting with DB clusters would be premature optimization, but designing your schema with the expectation that it will be on multiple instances later is sane. Kind of like building a monolith with the expectation that certain components will eventually be moved out into their own services.
> Nothing was surprising, we got exactly what we expected.
Such a satisfying feeling in the engineering world.
> We noticed Cassandra was running 10 second “stop-the-world” GC constantly but we had no idea why.
This makes me very thankful for the work that the Go team has put into the go GC.
> In the scenario that a user edits a message at the same time as another user deletes the same message, we ended up with a row that was missing all the data except the primary key and the text since all Cassandra writes are upserts.
Does cassandra not offer a mechanism to do a conditional update? I'd expect to be able to submit a upsert that fails if the row isn't present, or has a `deleted = true` field, or something to that effect.
> This makes me very thankful for the work that the Go team has put into the go GC.
If you read that part of the piece again, the problem with the GC wasn't necessarily with Java's GC implementation but with Cassandra's storage architecture combined with a mass delete, which tombstoned millions of records and then Cassandra had to scan them and pass them over: "Cassandra had to effectively scan millions of message tombstones (generating garbage faster than the JVM could collect it)"
Even if you had an incremental GC, you'd be eating CPU and thrashing the cache like crazy doing that. And you may never catch up.
Actually even if you didn't have a GC language, you'd likely still have scan or free operations.
Writing databases is hard. And actually the various JVM's GCs are top knotch and have had hundreds of thousands of engineering hours put into perfecting them for various workloads.
Still, these days I'm glad not to be near it and I work in Rust or C++.
I'm not discounting that the GC had an overwhelming amount of work to do.
My comment wasn't intended to convey that the go gc is somehow more efficient, but rather that I'm thankful for the tradeoffs they've made prioritizing latency and responsiveness, and minimizing the STW phase. I can see how I didn't give enough context for it to be interpreted correctly.
It's been my experience that with larger STW pauses, you wind up with other effects -- to the outside observer it's impossible to tell if the service is even available or not. You could argue that if you're thrashing memory that hard, no, it's not available. In general though, it makes for higher variance in latency distributions.
> Writing databases is hard. And actually the various JVM's GCs are top knotch and have had hundreds of thousands of engineering hours put into perfecting them for various workloads.
no doubt on both accounts. again, thankful for the design tradeoffs.
> Still, these days I'm glad not to be near it and I work in Rust or C++.
For anything that really needs to be very consistent, or have full control execution, I don't blame you. Cassandra is a prime example of what happens when you don't have that full control.
I think they're actually talking about both at once.
Cassandra definitely has compaction (similar LSM storage as RocksDB), and it causes pauses.
It's been over 10 years since I touched Cassandra, but when I did there was often a royal dumpster fire when compaction & JVM GC ganged up to make a big mess.
"Conditional update" requires consensus to be fault tolerant, and regular Cassandra operations don't use consensus. Cassandra has a feature called light-weight transactions which can sort of get you there, and they use Paxos to do it. It unfortunately has a history of bugs that invalidate the guarantees you'd want from this feature, but more modern versions of Cassandra are improving in this regard.
> This makes me very thankful for the work that the Go team has put into the go GC.
Has the Go team hit on some GC algorithm that has escaped the notice of the entire compiler-creating world since 1995? Not to disparage Go at all, it's got brilliant people working on it, but you could boil the oceans at this point with the effort from the brilliant folks who have been working on JVM garbage collection for the last 20+ years.
The Go team has members with decades of experience writing garbage collection algorithms, including on the JVM, so it's not starting from scratch.
Go has value types and stack allocation that allow it to create _far_ less garbage than Java and means its heap objects violate the generational hypothesis-- because shortlived objects tend to stay on the stack. This means that the Go GC can comfortably operate with far less GC throughput than typical Java programs.
> Has the Go team hit on some GC algorithm that has escaped the notice of the entire compiler-creating world since 1995?
It's like the saying about recycling: "1. Reduce 2. Reuse 3. Recycle". Just like in real life, it's far better to never have to go to all the way down to #3. If you never generate garbage in the first place, you won't have to waste resources cleaning it up. So it's less about how good the actual Go GC is and more that the language itself greatly discourages wasting memory via aggressive stack allocation, value types, and an allocation-conscious culture.
It actually has very poor throughout when compared to almost any JVM GC. This is mitigated somewhat by the commonality of value types.
The major advantage is mostly predictably short pauses. For web services it’s a reasonable trade off, but it can be a problem that the GC can’t be tuned for something else.
And their mps definitely scales based on time-of-day, illustrated by this blog image[0] from this blog[1] (given their CPUs are only really handling messages).
I haven't used Cassandra for about 3 years, and this is awakening memories. At a previous company I inherited a very badly assembled cluster that was used for a time series database. The guys who built it said (and no, I'm not kidding...) "we don't need to put a TTL on metrics because they're tiny and anyway we can just add more nodes and scale the cluster horizontally forever!". Well, forever was about 2 years, when the physical data center ran out of rack space, and teams abused the metrics system with TBs of bullshit data. That was when they handed the whole metrics system to yours truly. And I discover two things:
1. You can't just bulk delete a year of old stale data without breaking it
2. "Woops, did we really set replication factor to 1?"
[sorry if you're reading in real-time, lots of edits]
It's mentioned briefly in the original article. Cassandra achieves extremely good write performance by being append-only, which means that is disk IO patterns are sequential, with little random IO. This gives great performance on crappy spinning disks. However, with append-only, updating or deleting existing data is handled by writing new data or "tombstones".
When you perform a search, Cassandra still scans all matching data in the table, including that which was deleted. It then uses tombstones to "ignore" deleted data. In other words, lazy delete is great for write performance (Cassandra is optimized for write), but creates overhead during read.
So to avoid killing the cluster during a search, Cassandra imposes a tombstone limit. If a search contains more tombstones then the limit, the search will be abandoned. Real world example: if you're using Cassandra as a time series database and you don't add a sensible TTL to your metrics, a bulk delete would cause searches to suddenly fail, and they wouldn't work again until after compaction is performed on the table, rewriting the whole table minus data that has tombstones older than gc grace period.
Metrics systems aren't just used for eye-baling dashboards during an incident, they're part of your alerting system (usually) so this could lead to missed alerts or false positives.
This is why it's important to continually "age-out" data with a TTL, so that compactions are regularly removing tombstones older than the gc grace period.
"Fucking gc grace period!" (me, on a few occasions)
Why is there a gc grace period in here, making trouble for us? Because Cassandra is a distributed, replicated system that handles partitions and failures, it knows that some nodes may be temporarily unreachable. So it holds on to tombstones long enough to ensure that every node gets them. If it didn't do this, an offline mode may not get the tombstone before it's compacted out of existence in all other nodes. So when it comes back online you get zombie deleted data, back from the dead! Not nice!
If you fuck this up and end up with a huge set of old data that you must delete, you are in for a treat. Easiest option is to manually schedule a bunch of operations to run during a quiet period. With metrics, write traffic volume is pretty linear but read traffic is user-driven. That could mean that quiet period are nightly, on weekends etc. The operations involve checking for downed nodes, modifying gc grace period, deleting a bunch of data and running a compaction. You have to carefully monitor logs for errors, and keep an eye on tombstone limits. Depending on your infrastructure it might be easier to build a new cluster side-by-side, write all the data from the old cluster into the new one with proper TTLs and delete the old cluster. If you're on bare metal and are out of space, you might have to get creative. (To keep a production metrics system running for a few years I had to get extremely creative)
Sorry for the huge thread and possibly wrong info, it's a while since I touched Cassandra, but I still have mental scars from the people who set it up wrong.
I don't know databases and there's quite a number of them on the market, so posts like these are great.
Sidetrack though, does anyone have a list for pros and cons of each db, with a preference towards low latency? Also how does it compare with say Postgres?
That depends heavily on the shape of your data, what your workload looks like, what sort of consistency guarantees you need, etc. I recommend Designing Data Intensive Applications for getting a handle on this - it's the book I suggest to SDE IIs who are hungry for the jump to senior. Not a quick read, but well written, and there's not really a shortcut to deep understanding other than deep study.
> I suggest to SDE IIs who are hungry for the jump to senior
Any suggestions for seniors who are hungry for a staff title. Is there any value to specializing in tech anymore after senior or is it all 'soft skills' at this point ?
It is not all any one thing. If you’re completely lacking in soft skills, you’ll probably have more trouble the higher up you go, and you may get stuck at some point. Hard technical skills become less important the higher you go, but even the CEO of a tech company probably still needs to have some basic understanding of technology. Being good at both things is obviously the best way forward.
For the sake of that talk I focused only on open source you could self-host.
SQL
• PostgreSQL
• CockroachDB
NoSQL
• MongoDB
• Redis
• ScyllaDB [Cassandra]
There are a lot of points in the DBaaS world to also consider, but there's also issues of visibility — can these issues be known, perceived or understood from a user's POV? Are they documented? Or are some of these considerations obscured and trade secreted? So I had to pare down.
I planned to do a whole different talk on DBaaS systems, and yes, I'd likely shortlist Bigtable, Spanner, DynamoDB, and a few others for that.
Unfortunately, they're not doing a good job at deleting them. If you press the "Delete Account" button, all it does is anonymize your profile, and leaves all of your messages intact. One of the reasons I avoid using Discord whenever possible.
I can't think of a good reason why the presence or effectiveness of a delete button should change what you write online, even in a so-called "private conversation", or whether you use a certain platform.
Delete is useful, but it shouldn't change what you're willing to send online because delete doesn't time travel and unsend the message.
If you are typing anything you desire to stay hidden forever into a completely unencrypted and openly data mined and brokered (read your privacy policy) cloud-based solution then I have a bridge to sell you...
Discord doesn't respect privacy, you cannot just get rid of a whole conversation. Users are the product, and they make it so difficult to delete entire convos that it's so obvious it's just valuable to them.
how many are bots? AFAIK bot traffic is higher than human there. Plus they added so much bot APIs that now bots are hacking and spamming users' accounts
Elsewhere in social media and gaming many accounts may have both manual and automated activities associated with it.
For example, in a WoW chat, that might be your friend typing "lol," or it could be an add-on putting out an automated raid warning.
Similarly, your Twitter account could be a manual post, or a scheduled one from Tweetdeck.
Are you now a "bot account?"
Discord does permit bots in its system. But it tries to limit "self-bots" on the platform. It wants humans to be humans and bots to be bots. "Self-bots" in Discord could result in termination. Read more here:
Also the article links to Twitter blog, which gives similar point (it's from 2010): "We [Twitter] currently use MySQL to store most of our online data. In the beginning, the data was in one small database instance which in turn became one large database instance and eventually many large database clusters" [1]
[1] https://blog.twitter.com/engineering/en_us/a/2010/announcing...