Get PostgreSQL Database Structure as a Detailed JavaScript Object

derefr · on Dec 5, 2019

I would like to point out that SQL itself standardizes a set of tables (called the information_schema) that can be queried, in a completely uniform way, to learn pretty much every (SQL-standard-formalized) fact about your database.

information_schema doesn't cover anything the DBMS is doing that's not standardized by SQL, though. This would mean that, for PG in particular, things like partitioning/inheritance, tablespaces, and the distinction between roles and schemas, aren't represented in the information_schema. (The objects do show up, but only as their SQL-standard "superclass"—e.g. partitioned tables just look like a set of regular tables + constraints + triggers + rewrite rules; table columns with special types look like their underlying storage type; etc.)

The information_schema also doesn't cover the pragmatics of the administration of the database instance itself, like, for PG, the type of data you'd access through any of the pg_stat_ tables.

But neither of these concerns are really relevant if you're building a generic tool that wants to prod at SQL-standard database objects, I would think. Just use the information_schema!

---

I would like to link here the relevant part of the SQL standards document, because it's actually a very helpful and thorough reference. (Even if you never program for more than one DBMS, just learning how it defines words like "catalog", "schema", and "object" will make reading any particular DBMS's docs 10x easier.)

But, sadly, the SQL standard itself is a proprietary document that you have to purchase! (See here: https://modern-sql.com/standard) This is a pretty odd thing, considering that unpaid developers of FOSS systems like PG need to reference the standard for compliance.

jorams · on Dec 6, 2019

> But, sadly, the SQL standard itself is a proprietary document that you have to purchase!

This is a massive problem with standards in general. ISO 8601, for example, the international standard for date and time representation that everyone should be using, similarly consists of multiple documents that you have to purchase at significant prices. The part covering "basic rules" costs 158 CHF (160 USD) and the part covering "extensions" costs 178 CHF (180 USD).

I think this is seriously hampering the adoption of standards outside of industries that explicitly require compliance, and particularly in open source. If you can't even know whether you're compliant without spending serious money, why care about compliance at all?

erichanson · on Dec 6, 2019

To derefr's point, I'll also point to the "meta" PostgreSQL extension (part of the Aquameta project, which I maintain) whose purpose is to provide a more readable version of information_schema:

https://github.com/aquametalabs/meta http://blog.aquameta.com/intro-meta/

information_schema is pretty unruly, and pg_catalog is just crazy, in terms of readability, but if you just want to do some simple introspection, meta might be helpful.

That said, I see value in a nice Javascript object that is easy to traverse and can be retrieved all at once.

aargh_aargh · on Dec 6, 2019

There's broken link in your article to https://github.com/aquametalabs/aquameta/tree/master/core/00...

xemoka · on Dec 5, 2019

Right, and anything else should be in the pg_catalog; information_schema are just views over that. https://www.postgresql.org/docs/current/catalogs.html

stephen · on Dec 5, 2019

True! Good point and clarification.

The nice aspect of pg-structure for me was getting all of that information_schema goodness already pulled out into hashes/DTOs from a 1-line "await pgStructure(...)" call.

I.e. w/o pg-structure, I'd probably end up writing a mini/hacky version of it that did the same thing, "do these ~3-4 SQL queries and mash them into some nice hashes for my code generator to consume".

Which is not terrible, but it's easier to just pull in another npm package. :-)

djrobstep · on Dec 5, 2019

As the author of a postgres-specific schema inspection tool, I found I very quickly ran into limitations of information_schema. Many of the critical details are simply not there.

hobs · on Dec 6, 2019

As a DBA every full fledged system I have worked with veers off of information schema almost immediately, and you are going to be forced to implement some special set of queries if you want an accurate version of any real db schema.

This will also likely reflect the current version of the database engine you are working with, there's no consistent approach to know everything for all databases for all versions.

miohtama · on Dec 5, 2019

A little bit similar tool for Python:

https://pypi.org/project/sqlacodegen/

Note that SQLAlchemy itself offers this functionality in core, codegen just makes it more human readable static files.

stevesimmons · on Dec 5, 2019

sqlcodegen is brilliant.

I used it to document a 1000-table db, and then added autogenerated docstrings for each table with # of rows/columns, list of the column names, and the contents of its first and last 5 rows.

This was invaluable in understanding a decade of legacy system development...

isoprophlex · on Dec 6, 2019

This is brilliant! Thanks for sharing this, made my day!

evangow · on Dec 5, 2019

This is a very cool project, but I'm not sure exactly what I would use it for. I could see it being useful if didn't start your project out using a tool for schema migrations and need to back reverse engineer things to create fake migrations.

I'm curious, does anyone here use it that would care to explain their use case?

paps · on Dec 5, 2019

We use it in our CI, just before deployment, to compare the DB structure of what's being tested with the DB structure of our staging or production environment.

It's a last minute check that prevented a lot of mistakes.

djrobstep · on Dec 6, 2019

You might be interested in the service I'm building, which does a similar thing in a more structured way: https://djrobstep.com/ci-for-databases

larkost · on Dec 6, 2019

Wouldn't a dump and then diff of the schema into text form accomplish the same thing?

hobs · on Dec 6, 2019

It absolutely should, though you might want to have a specific set of questions you ask in an ordered fashion as things like column order might differ between production and development due to data sizes, dropping and recreating dev instead of migration, etc.

None of those things actually being a problem could give you false positives, so you might want some minor shuffling.

fanf2 · on Dec 6, 2019

Or something a bit more high-level like Migra? https://github.com/djrobstep/migra

sciolistse · on Dec 5, 2019

It seems like it could be used for declarative schema management (like https://www.skeema.io/) where you define the table structure you want in code and the application diffs a database and performs whatever changes is needed to reflect the requested schema.

stephen · on Dec 5, 2019

I'm using it generate a "flush_database()" stored procedure (after applying the latest migrations), that means each "before each" of a test can "execute flush_database" to get a clean slate db for its boundary case (w/o having to know/lookup "oh we've got 10-20-50 some entity tables now").

So, yeah, just custom/minor infrastructure/tooling stuff that is based on the db schema.

aargh_aargh · on Dec 6, 2019

Doesn't your RDBMS support transactions?

djrobstep · on Dec 5, 2019

I have a similar project called `schemainspect` - It creates a data structure representation of a postgres schema.

I then use this data in my database diffing project `migra`, to autogenerate database migrations. You can also use schema diffing to test that your production and development databases match explicitly.

Overall it's a much more flexible but rigorous approach than the old-fashioned rails/django migrations.

kristiandupont · on Dec 6, 2019

I've made something similar which I use for two things. 1) I create typescript types from my schema, and 2) I run a "linter" over my schema that validates a number of rules that I've created. It's been a great help honestly.

rodw · on Dec 6, 2019

For what it is worth, if you mean the `>` symbol in the banner graphic to be interpreted as an arrow not "greater than" if I were you I'd give the arrow some kind of tail (`->` not just `>`). My first interpretation was definitely "greater than" and I'm probably not alone.

If you _do_ really mean "pgsql > json` and not `pgsql -to-> json` then maybe I'm just confused.

theta_d · on Dec 6, 2019

Unix redirecting output uses the `>` symbol.

rodw · on Dec 6, 2019

Oh, yeah. I am well aware of that and that interpretation did not enter my mind at all (being entirely surrounded by graphics and not in a conventional terminal-style font may be part of the reason why).

I still think it's a little confusing but I understand why it is that way at least.

soulnothing · on Dec 5, 2019

Is there a good source for the queries run on different data base types (MySql, Postgres) to reflect it's schema?

mvc · on Dec 5, 2019

The information schema is part of the SQL standard so the queries should be the same in all the databases that support that (which is quite a few). Here's the doc links for MySQL and PostgreSQL

https://dev.mysql.com/doc/refman/8.0/en/information-schema.h... https://www.postgresql.org/docs/12/information-schema.html

As it happens, I did a talk about this at re:Clojure this week. Videos aren't up yet but they will appear here when they are available

https://www.youtube.com/channel/UCbZW8yCqEncYciie8_1yy7w/fea...

pmart123 · on Dec 6, 2019

One way I found helpful was to look at ORM libraries like sqlalchemy to see how they queried each db to translate schemas into objects.

simbakhadder · on Dec 5, 2019

Is the expected use case to build generic queries on any type of database?

Also why Javascript?