>“But postgres isn’t a document store!” I hear you cry. Well, no, it isn’t, but it does have a JSONB column type, with support for indexes on fields within the JSON blob.
> approximately 2.3m content items.
I had a previous project where we did a similar thing (except with HSTORE instead of JSONB) and it exploded rather dramatically (very simple queries took multiple minutes or timed out entirely) after around 30m rows. I hope the Grauniad doesn't run into a similar issue, or at least anticipates it better than we did.
2.3m content items is tiny. So is 30m. You're at least 1-2 magnitudes away from something that will start to bother postgres. Anything before that is likely to be an index or IOPS issue.
On that note, one thing that the article points out is that managing the MongoDB setup was a full time job.....although managing Postgres will probably be as well. It would be easy and understandable to try to make it static and not needing attention, but that is a recipe for bad times.
Having issues accessing Xmillion rows seems like something that would be caught by someone focused on performance of the database full-time.
RDS is really good at not having to be managed tbh, at the guardian's scale. And when shit does happen they get to be able to throw money at AWS's premium support. I definitely think their move makes a ton of sense there.
Aye even so. They're definitely heavier to store, and their indexes can get quite big, though, so disk space / IOPS can be more of an issue. But still 2.3M is a drop in the ocean. (I don't know about HSTORE though…)
At my previous startup I was ingesting Hearthstone games at a rate of 1-2M / day. Before being handed off to permanent storage (s3, redshift etc) a bunch of data would get stored in a JSONB, with 14-day retention. This all ran on a 200GB t2.large instance on RDS, was our smallest instance and never really caused an issue.
> approximately 2.3m content items.
I had a previous project where we did a similar thing (except with HSTORE instead of JSONB) and it exploded rather dramatically (very simple queries took multiple minutes or timed out entirely) after around 30m rows. I hope the Grauniad doesn't run into a similar issue, or at least anticipates it better than we did.