Hacker News new | past | comments | ask | show | jobs | submit login
Unfinished Business with Postgres (craigkerstiens.com)
165 points by ctoth on May 18, 2022 | hide | past | favorite | 36 comments



Thank you for writing this bit of history, Craig! I have a personal half-serious theory that it was Heroku that really gave Postgres its break-through popularity. I think life is often like that: after years & years of hard work, you get just the right coincidence of external factors to let you take off. All during the 'oughts Postgres seemed like an eccentric ideological choice over MySQL, but most of us had never really tried it. With Heroku we were basically forced to use it, and suddenly we could see that all those eccentrics were right. I love Postgres and have tried to make it more & more a part of my own career. So thank you to you and the Heroku team for making such a principled and brave choice back then.


I don't think that's a bad theory, but I'd point more towards three particular events as critical:

1. Postgres 7 supporting windows (I _think_ 7 was the first). That brought a ton of users into the community

2. Oracle's acquisition of mysql, and subsequent forking. That caused a lot of people to look around for other solutions

3. Amazon RDS


> 2. Oracle's acquisition of mysql, and subsequent forking. That caused a lot of people to look around for other solutions

It was this one for me.


I would add that it had a better reputation among users of "serious" databases. I first heard of Postgresql at companies built around big, expensive Oracle databases, managed by serious, battle-hardened DBAs with long experience with Oracle and DB2 and other big-iron databases. The ones who deigned to have an opinion about open-source databases spoke very differently of MySQL and Postgresql. They were openly derisive of MySQL and laughed it off as something that nobody who knew databases would ever use, but with PostgreSQL, a little bit of respect showed through their snobbery. They couldn't imagine building a business around it (they couldn't imagine building a business around anything that didn't cost the price of a decent car per CPU core) but they were willing to admit that the people who built it had probably seen a real database before, and maybe it was suitable for some non-business-critical uses where Oracle's licensing was cost-prohibitive. That left an impression on me and developers around me such that later, when we were in positions to choose databases for projects, we felt much safer going with Postgresql than with MySQL.


2. is what pushed most people over to PostgreSQL. That was the main impetus. Fortunate that, as MySQL is utter garbage, having silently corrupted data for a very long time, not being ANSI SQL compliant for a very long time, and the developers being unable to come up with one unified storage engine to handle many different loads efficiently, like Oracle can.

I was not one of those people though, but lucky enough to have had PostgreSQL running on DEC Alpha and GNU/Linux (Red Hat) as the core tool for the relational database course while I was getting my degree in computer science / information technology. A whole semester spent at the psql prompt doing SQL and programming the web application talking to it in PHP. That was back in the 20th century.


4 oracle changing licensing to be more hostile toward virtualization, right around at VMware peak


5. Postgres having a greater feature set than Oracle (ie schemas not tied to users, for example), a perfect implementation of SQL92 norm, being consistent, and easy to install, while Oracle being a hell to install followed by Oracle playing the lawyer game to threaten companies…


PostgreSQL still doesn't have nowhere near the features of Oracle: for example, it still cannot do synchronous multimaster replication, unless one buys Vertica.


I believe the parent didn't mean the Oracle DB but Oracle MySQL. This makes the most sense as Oracle's acquisition of MySQL (Sun) was discussed before.


The described issues fit better to Oracle DB. Schemas tied to users is definitely the case for Oracle DB and MySQL isn't that hard to install.


for sure, PostgreSQL is feature complete as Oracle, however it is the best open source db that is close enough. And not everyone needs multi-master replication or automated materialize view, etc. Many feature in Oracle can be build custom via functions and store procedures in Postgres.


Amazon RDS was MySQL first, so was Aurora.


I agree with you. Heroku PG was the driver for RDS PG from what I saw. People were using it without having any idea why, just because it was what Heroku had available.

At the same time, a lot of people kept trying to just use it as a dumb data store like MySQL without realizing exactly how much you could do with it. 90% of the time you don't need a dedicated search engine syncing and all the headaches that come with it, for example.


I suspect in part due to some modesty, Craig drastically undersells just how much of an effort they put in to influencing this across the industry. Something Heroku was absolutely incredible at, that people don’t recognise enough, was that kind of bottoms up and almost organic shifting of the status quo. The time periods were discussing here are when SQLite was the default database for Rails apps in dev and MySQL was the recommendation for production. Postgres was definitely an esoteric choice that went against the preferences of the core target customer base. But they persisted. By making the investment necessary to make postgres with Rails amazing. By having dev advocates out there evangelising it. By hiring the most prominent people in the community to make it happen. It’s literally years of persistence and millions of dollars of investment to make stuff like that happen. And then they’d rinse and repeat with python, Scala, clojure, etc. etc.

That’s not to imply Heroku is the sole reason Postgres is where it is today. It’s the effort of tens of thousand of people in the community. But Heroku’s investment in playing their part to make that happen was very deliberate and intentional.


> eccentric ideological choice over MySQL

I was on the other side of this trade. My experience was that everyone used Oracle. Then PostgreSQL and MySQL appeared as OSS alternatives. After spending a day reading the dev mailing lists for both it was very clear to me I would never deploy MySQL in any situation where I had control over the DB choice. History seems to have proven that to be a reasonable position.


Back when I was a baby programmer in the early 00s picking tools to learn about (I'm self taught), I picked a few obscure looking things that looked like the right choice. These included R (which has now taken over the statistical world), Debian, Postgresql and Perl (good choice for me - worked out well).

Pg always looked like a better choice than mysql to me. As evidenced during an early job I did where mysql disappeared some of my data it really shouldn't have done due to its well understood but not by me back then pitfalls.


With over 15,000 packages at CRAN, R has evolved from a wonderful language for statistics to a wonderful general-purpose language. I came in contact with R by accident because a colleague of mine took a day off and the client approached me instead, and it's a love affair with R that has lasted since. I haven't been excited about a programming language since I was last learning the MC68000 family's assembler. I just cannot sing R enough praises. R of course has packages for connecting to a PostgreSQL database, and Vertica is a PostgreSQL database with R as the scripting language. Such a wonderful combination. I'd gladly use and administer Vertica again without a second thought, given the opportunity,


Oh no, R as a general purpose language, argh!

I mean yeah whatever floats your boat but I find it torturous for anything other than as a specialist platform for doing linear algebra - aka statistical analysis.

I got a nice half hour meeting with one of the core R developers a few years ago back when I was pretending to be an academic - he was a regular collaborator with one of the senior academics in the business school that most of my work was coming from at the time.


I feel like everyone who worked with MySQL, had some story with lost data. Two companies that I worked for used MySQL and in both of them I had an instance where data was lost or corrupted and it did not involve hardware failure.

First instance was that there was MySQL service that crashed and refused to start up. The machine was not rebooted or shut down abruptly, disk was not corrupted, also disk was not full and had plenty of space. Running repair command resolved the issue, but it was weird.

In another instance we run into a bug and apparently certain character combination in the data made MySQL all out of sudden think the data was encrypted.


It was many things, but I personally think what started the dominoes rolling was actually Python.

MySQL got popular because it was the database of choice on PHP, while on Python that was Postgres (via psycopg2).

If the database of choice for Python wasn't Postgres, Heroku likely would stick with MySQL.

There were also other things that contributed to it, like for example purchase of Sun by Oracle and then stalling MySQL development.


Indeed, heroku was first time that I used postgres, or databases in project really.


Quite a journey Craig and really funny to see how Heroku ended up with PostgreSQL. In my estimation, that decision is one of the biggest factors in the growth of PG adoption everywhere over the last decade. What a butterfly effect for that engineer chiming in.

Dataclips really was a great feature. We were using it for all of our internal dashboards at the company where I worked in 2013. One of our support staff even learned SQL due to interacting with it and went on to get a CS degree a few years later.


>As an early PM I first worked on billing

And then moved on to scalable RDBMS for a PaaS.

It's like a fairy tale, but one of the Grimm Bros. ones, not Disney.


>near miss when a disk was lost that caused a rather horrific amount of effort and some nailbiting in restoring from pgbackups.”

I hope that eliminating a single drive fault failure mode isn't part of the unfinished business.

Twelve years ago RAID 6 and 60 definitely existed. Battery backed FC arrays of considerable sophistication have existed for far longer. I'm thinking HDS arrays circa 2002 for peak redundancy complexity (before integration reduced parts and physical runtime concerns making the installation environment as important as the technology.)


One time, me, a noob, accidentally deleted files in the PG data directory. No noe! But PG had open handles to them so they aren't reclaimed by the FS and I was able to pg_dump. Not a production system but the loss would've been "big". Just saying, PG itself is very resilient.


> One time, me, a noob, accidentally deleted files in the PG data directory. No noe! But PG had open handles to them so they aren't reclaimed by the FS and I was able to pg_dump. Not a production system but the loss would've been "big". Just saying, PG itself is very resilient.

That isn't a PG thing; Linux[1] ext3/4 does not actually delete files (or directories) that are still in use. Only the name is removed. The file itself remains until the last open handle to it is closed.

[1] Amongst other OSes


if you'd read the previous post carefully, you'd have seen they said that


> if you'd read the previous post carefully, you'd have seen they said that

I did read it carefully (and quoted the whole thing for context), but that final line made it seem that it was PG that was responsible for that particular instance of resilience.

Someone who isn't familiar with how ext3/4 handles deletes might, from the comment alone, assume that the same sort of thing would happen on other filesystems.


Did their clarification truly merit such a snarky response?


it wasn't clarification; it was repetition


No, the comment did make it sound like it was PG’s resilience that saved the data. In fact that seems to be the whole point of the comment.

So the snark does seem out of place and the clarification useful.


nice bit of white-knighting there


> nice bit of white-knighting there

Is it really constructive to throw insults[1]? I explained myself above, after all.

[1] I had to look it up to see what it means: https://en.wikipedia.org/wiki/White_knight. Since we aren't in a business context, I can only assume that you think I am female :-)


EBS has always been RAID. You need more.


I definitely had battery backed RAID installed in my servers the year 2000. RAID 10 for me.


What were you using for your RAID solution on AWS in 2009?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: