This post was about using Cassandra for message storage. You are basically advoc...

manigandham · on Jan 21, 2017

> and forces routing to clients.

Both Cassandra and Redis Cluster will forward queries to the correct nodes and both use client drivers that learn the topology to properly address the right nodes when querying.

cookiecaper · on Jan 20, 2017

Ah, I see. Thank you for clarifying that this is not the service to which jhgg referred.

Obviously, I haven't addressed this problem in-depth and I don't really know enough about the specifics to criticize the decision directly. It's completely possible that Cassandra was the perfect fit here.

The previous thread was in the context of switching things up without strong technical motivation. I said that actually, it does seem easier to fix a redis deployment than to write a microservice architecture backed by Cassandra, and that I hope to hear more about a stable production ethos from the tech community as a whole. There are a lot "moving on to $Sexy_New_Toy_Z" posts, but not a lot of "we solved a problem that affected scaling our systems, and instead of throwing the whole component away and starting over, we actually did the hard work of fixing and optimizing it: here's how".

To address your specific complaints.

>You are basically advocating for plugging 2 systems together, which out of the box don't provide elasticity.

I mean, again, without getting in-depth, I'm not advocating anything (I feel like I need a "This is not technical advice; if you have problems, consult a local technician" disclaimer :P).

However, storing lots of randomly-accessed messages and maintaining reasonable caches are not new problems. There are lots of potential solutions here.

And while Cassandra is not "new tech" in the JavaScript-framework-is-a-grandpa-after-6-months sense, it's certainly "new tech" in the "I'm the system of record for irreplaceable production data" sense.

Cassandra is also among the less-used of the NoSQL datastores, putting it in a minority of a minority for battle-tested deployments. You mention Netflix and someone other big name using it in production as part of your belief that it's stable. This, I think, is part of the problem.

These big companies use these solutions because

a) they truly do have a scale that makes other solutions untenable (although probably not the case with Cassandra itself);

b) they can easily afford the complex computing and labor resources needed to run, repair, and maintain a large-scale cluster. Such burdens can be onerous on smaller companies (esp. labor cost);

c) when they need a patch or when something starts going awry, they can pay anyone whose willing to make the patch, their own team not excluded. Often the BDFLs/founders of such projects end up directly employed by these big companies that adopt their tech.

"Netflix [or any other big tech name] uses it so we know it's stable" is a giant misnomer, IMO.

None of this is to say that Cassandra isn't a good choice for this problem or any other specific problem, because again, as a drive-by third-party I don't know. But contrary to what the article states, it hardly seems like Cassandra was the only thing that could've possibly fit the bill. I bet it could be done well with a traditional SQL databases (which, from the body of the post that identifies Discord as beginning on MongoDB and planning to move to something Cassandra-ish later on, it doesn't sound like was ever tried or considered).

It's kind of like reading an RFP that was written by a a guy at a government agency that already knew they really wanted to hire their brother-in-law's firm. "Must have $EXTREMELY_SPECIFIC_FEATURE_X, because, well, we must! And it just so happens that this specific formulation can only be provided by $BIL_FIRM. What d'ya know."

>We care about seamless elasticity for our services which Redis doesn't provide out of the box except with Redis Cluster which does not seem to be wildly adopted and forces routing to clients.

First, you just admitted that Redis actually does have that feature that you're saying it doesn't have. "Redis Cluster" and "Redis" are the same thing. "Redis Cluster" is part of normal redis and afaik, while it requires additional configuration, it will automatically shard.

In any case, while I have no numbers, I would wager that Redis Cluster is more widely used than Cassandra.

manigandham · on Jan 21, 2017

Cassandra doesn't require all the data to live in RAM. At this scale, you need disk-based data access.