Why not use PostgreSQL instead? It supports a JSON document data type natively. ...

wdb · on May 23, 2020

I really enjoy using PostgreSQL only I just don't know how to make it scale easily. Running it on large VM in the cloud works fine until you have lots of data or need it easily accessible. How can you have the data in three different regions (e.g. Europe, US, Asia) when you using something like Google Cloud? Seems to be a hard problem to crack.

jatone · on May 25, 2020

generally speaking the only reason that would be difficult would be due to legal constraints on the data locality.

if it was just performance than read only replica's in each region would solve most of the issues.

for the legal case generally I just end up with a separate postgresql DB in each region and during login the region is determined by user/company.

usually ends up being like 1k LOCs total.

mateuszf · on May 24, 2020

Try with Aurora Postgres Global Database, it's on AWS, but should be accessible from other clouds.

aeonsky · on May 23, 2020

Postgres has terrible indexing with json. It doesn’t keep statistics so simple queries sometimes take much longer than expected due to query planner not knowing much about the data.

orf · on May 23, 2020

“It doesn’t keep statistics” is a weird way to say “I expect full table scans to always be fast”.

Create a functional index.

aeonsky · on May 24, 2020

I’m not sure if I can even understand what you are talking about or why you said that.

If you have pid,{name:’val’, others...}. And an index of name with a million John and one Jane. Good luck getting fast results.

orf · on May 24, 2020

> If you have pid,{name:’val’, others...}. And an index of name with a million John and one Jane. Good luck getting fast results.

Fast results: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=dd9370966b1528....

Postgres does actually keep statistics on json columns, but if you've got a functional index on the table and the query uses it then it doesn't matter if there is one "jane" and a million "johns". You're looking up a key in a btree index.

aeonsky · on May 24, 2020

No it doesn’t. But what do I know running several clusters in production for a $100 million business? Please read up on the subject before arguing.

orf · on May 24, 2020

> SELECT most_common_vals

> FROM pg_stats

> WHERE tablename = 'json_test'

> AND attname = 'json_column';

> {"{\"name\": \"john\"}"}

Hmm. Looks like it does though. Not that it makes a damn bit of difference because if you haven't got a functional index (i.e the stats are next to useless) then you're doing a full table scan, and in that case it sounds like you “expect full table scans to always be fast” :)

And sure, the statistics don't help with the query planner, unless you've got a computed column, but again see "I expect full table scans to always be fast" and re-consider the statement "postgres doesn't keep statistics on json columns" given the fact that it actually does, just like any other column.

Read up indeed!

fabian2k · on May 23, 2020

I’ve seen that as well, the default estimate for jsonb can seriously confuse the query planner. There is a patch in PG13 that addresses this as far as I understand, but I’m not familiar enough with PG internals to be sure I’m reading that right. I’ll be playing with this when PG13 is out, the jsonb feature is really useful, though I wouldn’t recommend to shove relational data into it. Many things are much, much harder to query inside jsonb than regular columns.

There are ways around the statistics issue in some cases, e.g. defining a functional index on a jsonb property will collect proper statistics.

pletnes · on May 23, 2020

DB noob question: if you know that you should be indexing on a json attribute, can’t you put it into a «proper column» and index there?

Mister_Snuggles · on May 23, 2020

There are a number of ways to do this:

* Extract the attributes you're interested in into their own columns, index these. With the extraction happening outside the database, this is the most flexible option.

* Similar to above, use a trigger to automatically extract these attributes.

* Also similar to above, used a generated column[0] to automatically extract these attributes.

* Create an index on the expression[1] you use to extract the attributes.

My use a JSON in PostgreSQL tends towards the first option. This works well enough for cases where documents are ingested and queried, but not updated. The last three options are automatic - add/change the JSON document and the extracted/indexed values are automatically updated.

[0] https://www.postgresql.org/docs/12/ddl-generated-columns.htm...

[1] https://www.postgresql.org/docs/12/indexes-expressional.html

why-el · on May 23, 2020

You could, of course. But that would mean that you are effectively not using json anymore. You need to pull the data out of your json on each write, update in two places, and so on. And if you need to delete a json column, what do you do with the other one? You need to delete it also. You are then managing two things.

There is always a trade off. If the column is important enough, then you are right, it should stand on its own, but then you lose the json flexibility. I personally almost always only use jsonb if I know I only care about that overall object as a whole, and rarely need to poke around to find an exact value. As a the grandparent comment mentions, if you do need a particular value, then it might be slower if your JSON records are too different (if you think about it, how can you calculate selectivity stats on a value if you have no idea how wide or different JSON records are?).

hartator · on May 23, 2020

> Why not use PostgreSQL instead? It supports a JSON document data type natively.

Yes, that's the thing, it's just a field type. It's not really that different than dumping your JSON in a TEXT column. MongoDB is fun because it's truly JSON - BSON - so you don't have to run migrations, you can store complex documents, and have a more object oriented way of storing your data than SQL.

throwanem · on May 23, 2020

You should probably read the Postgres documentation [1] before you make erroneous claims like this. Postgres JSON fields can be destructured, queried, and aggregated sufficiently to cover at least the 90% cases in MongoDB usage.

I'll grant that Postgres probably isn't as much fun as Mongo, what with all its tiresome insistence on consistency and reliability. I would, however, argue that quantity of available fun isn't really a figure of merit here.

[1] https://www.postgresql.org/docs/10/functions-json.html

bjt · on May 23, 2020

> It's not really that different than dumping your JSON in a TEXT column

That was true of the initial "JSON" type support.

It is very much not true of the "JSONB" type, which was added in 2014 as part of Postgres 9.4. JSONB uses a binary serialization that supports efficiently selecting into JSON documents, putting regular BTREE indexes on specific fields inside the documents, or even putting Elasticsearch-like inverted indexes on complete JSON documents.

IggleSniggle · on May 23, 2020

This is so not true that it hurts. Postgres jsonb is highly queryable.

sergiotapia · on May 23, 2020

It's completely different than dumping json into a text field...

Read the docs, you can do a lot of fancy JSON stuff in plain ol' Postgres. It's really powerful and guarantees your data.