Hacker News new | past | comments | ask | show | jobs | submit login
Get PostgreSQL Database Structure as a Detailed JavaScript Object (pg-structure.com)
158 points by stephen on Dec 5, 2019 | hide | past | favorite | 29 comments



I would like to point out that SQL itself standardizes a set of tables (called the information_schema) that can be queried, in a completely uniform way, to learn pretty much every (SQL-standard-formalized) fact about your database.

information_schema doesn't cover anything the DBMS is doing that's not standardized by SQL, though. This would mean that, for PG in particular, things like partitioning/inheritance, tablespaces, and the distinction between roles and schemas, aren't represented in the information_schema. (The objects do show up, but only as their SQL-standard "superclass"—e.g. partitioned tables just look like a set of regular tables + constraints + triggers + rewrite rules; table columns with special types look like their underlying storage type; etc.)

The information_schema also doesn't cover the pragmatics of the administration of the database instance itself, like, for PG, the type of data you'd access through any of the pg_stat_ tables.

But neither of these concerns are really relevant if you're building a generic tool that wants to prod at SQL-standard database objects, I would think. Just use the information_schema!

---

I would like to link here the relevant part of the SQL standards document, because it's actually a very helpful and thorough reference. (Even if you never program for more than one DBMS, just learning how it defines words like "catalog", "schema", and "object" will make reading any particular DBMS's docs 10x easier.)

But, sadly, the SQL standard itself is a proprietary document that you have to purchase! (See here: https://modern-sql.com/standard) This is a pretty odd thing, considering that unpaid developers of FOSS systems like PG need to reference the standard for compliance.


> But, sadly, the SQL standard itself is a proprietary document that you have to purchase!

This is a massive problem with standards in general. ISO 8601, for example, the international standard for date and time representation that everyone should be using, similarly consists of multiple documents that you have to purchase at significant prices. The part covering "basic rules" costs 158 CHF (160 USD) and the part covering "extensions" costs 178 CHF (180 USD).

I think this is seriously hampering the adoption of standards outside of industries that explicitly require compliance, and particularly in open source. If you can't even know whether you're compliant without spending serious money, why care about compliance at all?


To derefr's point, I'll also point to the "meta" PostgreSQL extension (part of the Aquameta project, which I maintain) whose purpose is to provide a more readable version of information_schema:

https://github.com/aquametalabs/meta http://blog.aquameta.com/intro-meta/

information_schema is pretty unruly, and pg_catalog is just crazy, in terms of readability, but if you just want to do some simple introspection, meta might be helpful.

That said, I see value in a nice Javascript object that is easy to traverse and can be retrieved all at once.



Right, and anything else should be in the pg_catalog; information_schema are just views over that. https://www.postgresql.org/docs/current/catalogs.html


True! Good point and clarification.

The nice aspect of pg-structure for me was getting all of that information_schema goodness already pulled out into hashes/DTOs from a 1-line "await pgStructure(...)" call.

I.e. w/o pg-structure, I'd probably end up writing a mini/hacky version of it that did the same thing, "do these ~3-4 SQL queries and mash them into some nice hashes for my code generator to consume".

Which is not terrible, but it's easier to just pull in another npm package. :-)


As the author of a postgres-specific schema inspection tool, I found I very quickly ran into limitations of information_schema. Many of the critical details are simply not there.


As a DBA every full fledged system I have worked with veers off of information schema almost immediately, and you are going to be forced to implement some special set of queries if you want an accurate version of any real db schema.

This will also likely reflect the current version of the database engine you are working with, there's no consistent approach to know everything for all databases for all versions.


A little bit similar tool for Python:

https://pypi.org/project/sqlacodegen/

Note that SQLAlchemy itself offers this functionality in core, codegen just makes it more human readable static files.


sqlcodegen is brilliant.

I used it to document a 1000-table db, and then added autogenerated docstrings for each table with # of rows/columns, list of the column names, and the contents of its first and last 5 rows.

This was invaluable in understanding a decade of legacy system development...


This is brilliant! Thanks for sharing this, made my day!


This is a very cool project, but I'm not sure exactly what I would use it for. I could see it being useful if didn't start your project out using a tool for schema migrations and need to back reverse engineer things to create fake migrations.

I'm curious, does anyone here use it that would care to explain their use case?


We use it in our CI, just before deployment, to compare the DB structure of what's being tested with the DB structure of our staging or production environment.

It's a last minute check that prevented a lot of mistakes.


You might be interested in the service I'm building, which does a similar thing in a more structured way: https://djrobstep.com/ci-for-databases


Wouldn't a dump and then diff of the schema into text form accomplish the same thing?


It absolutely should, though you might want to have a specific set of questions you ask in an ordered fashion as things like column order might differ between production and development due to data sizes, dropping and recreating dev instead of migration, etc.

None of those things actually being a problem could give you false positives, so you might want some minor shuffling.


Or something a bit more high-level like Migra? https://github.com/djrobstep/migra


It seems like it could be used for declarative schema management (like https://www.skeema.io/) where you define the table structure you want in code and the application diffs a database and performs whatever changes is needed to reflect the requested schema.


I'm using it generate a "flush_database()" stored procedure (after applying the latest migrations), that means each "before each" of a test can "execute flush_database" to get a clean slate db for its boundary case (w/o having to know/lookup "oh we've got 10-20-50 some entity tables now").

So, yeah, just custom/minor infrastructure/tooling stuff that is based on the db schema.


Doesn't your RDBMS support transactions?


I have a similar project called `schemainspect` - It creates a data structure representation of a postgres schema.

I then use this data in my database diffing project `migra`, to autogenerate database migrations. You can also use schema diffing to test that your production and development databases match explicitly.

Overall it's a much more flexible but rigorous approach than the old-fashioned rails/django migrations.


I've made something similar which I use for two things. 1) I create typescript types from my schema, and 2) I run a "linter" over my schema that validates a number of rules that I've created. It's been a great help honestly.


For what it is worth, if you mean the `>` symbol in the banner graphic to be interpreted as an arrow not "greater than" if I were you I'd give the arrow some kind of tail (`->` not just `>`). My first interpretation was definitely "greater than" and I'm probably not alone.

If you _do_ really mean "pgsql > json` and not `pgsql -to-> json` then maybe I'm just confused.


Unix redirecting output uses the `>` symbol.


Oh, yeah. I am well aware of that and that interpretation did not enter my mind at all (being entirely surrounded by graphics and not in a conventional terminal-style font may be part of the reason why).

I still think it's a little confusing but I understand why it is that way at least.


Is there a good source for the queries run on different data base types (MySql, Postgres) to reflect it's schema?


The information schema is part of the SQL standard so the queries should be the same in all the databases that support that (which is quite a few). Here's the doc links for MySQL and PostgreSQL

https://dev.mysql.com/doc/refman/8.0/en/information-schema.h... https://www.postgresql.org/docs/12/information-schema.html

As it happens, I did a talk about this at re:Clojure this week. Videos aren't up yet but they will appear here when they are available

https://www.youtube.com/channel/UCbZW8yCqEncYciie8_1yy7w/fea...


One way I found helpful was to look at ORM libraries like sqlalchemy to see how they queried each db to translate schemas into objects.


Is the expected use case to build generic queries on any type of database?

Also why Javascript?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: