I'm a huge Postgres fan and yes I'm probably biased towards it. I do think it's probably the best open source database out there, but I also don't love these types of posts.
What makes something better than another things is a whole host of things, not just highlighting one really bad case from the other side. I'm sure if the MySQL world wanted they could point out ways that MySQL is so much better than Postgres. Just look back at how long it too for us to get upsert, we're finally getting a better replication story in recent years. To take this one case and highlight it doesn't put the Postgres community in a positive light and it's a shame, because most I know within the community value good engineering work.
MySQL has had a lot of people that contribute to it and has some good things about it, it might be worthwhile for us to pay more attention to where they are better and just focus on improving Postgres and leaving things at that.
I remember ages ago (a decade? or so?) I started working at a company using Postgres for a project, and one of the things my boss asked me to do was to set up replication for the database. We had it on all of our MySQL databases, and everyone knew (myself included) that Postgres was a more serious, enterprise-y, "real" database.
Well, I googled around and couldn't find anything conclusive, so I hopped into the / channel for postgres and asked about it. "Is there any straightforward replication solutions for Postgres?"
It was like the early days of open source all over again. The first thing they did was challenge my needs. "Why do you need replication in the first place?" "Replication isn't a panacea you know." "Postgres doesn't have the issues that MySQL does that makes replication necessary." "Replication doesn't work well in the first place."
A ton of people jumped into what had been a quiet channel to make sure I knew that replication, which postgres didn't have, was something I didn't need in the first place, most of which happened before they even knew why I was asking.
After sorting through all that, a few people suggested "solutions". "Just have a cronjob rsync your data directory over to another machine every minute, and if your first server dies then you can just start postgres on the second one." Other solutions were much more convoluted. Patch postgres, add this software, manage it all yourself. And through it all, they kept telling me that I didn't need replication anyway, "but if you INSIST…".
In comparison, the MySQL channel on freenode was staffed with MySQL employees and volunteers who were always happy to help you solve your problem while teaching you what you were missing and giving you resources you needed. They'd answer questions, help your configuration, and even fix your SQL queries for you, all without being condescending about it or making you feel like an idiot.
And that's why I gave up on Postgres: the community. Postgres's was ridiculously hostile and self-important, verging on insecure and defensive. MySQL's was helpful, friendly, and full of resources. To this day I still hear great things about postgres, and I know it does a lot of things better than MySQL, but whenever someone writes a self-superior blog post about it I always think back to how much easier my life was by using a database where I could ask questions of the community and get useful answers.
I'm a Postgres fan too, and I remember this -- it really sucked. I loved Postgres, but setting up replication with it was basically duct-tape and string.
Not to derail the thread too much, but this is exactly what's going on with Go and generics right now (and I'm a Go fan too.) "Why do you need generics?" "Generics don't solve all your problems" "Generics don't work well" "No language does generics perfectly".
Sorry to hear that your interactions with the community was not great. My experience is quite different - perhaps I was lucky, or maybe it's because I started contributing to the project, not sure. But I stuck with PostgreSQL for exactly the same reason why you left it - the community.
I'm sure there are still people who respond in the "RTFM" or "Try rebooting it" style, but I doubt we're alone in that. One thing that is still generally disliked on the mailing lists are people treating it as unpaid production support ("Hey, I've started using your open source database, and not it failed and the production DB is down. Help me! Pronto!"). And I kinda share that opinion.
Perhaps it's worth giving the community another try? There's loads of new people contributing to it, so hopefully a more diverse audience.
Regarding the replication, before the built-in replication was added to PostgreSQL 9.0, the expectation was that replication will be implemented as a plugin, using some sort of internal API. That's kinda how PostgreSQL does stuff - it's extremely extensible (functions, data types, operators, indexes, ...) so the development community expected that to be the case for replication too.
But the API never materialized, because no one knew how it should look, because no one attempted to implement a plugin using a non-existent API. Chicken-egg problem. That changed though, and nowadays the built-in replication is considered a great feature.
Interestingly enough, the logical replication added gradually over the several recent releases (and particularly in PostgreSQL 10) is implemented using the API approach, and it worked nicely in this case.
Of course, this does not make your past experience any better, but hopefully it illustrates that the development community learned from that.
And it was the community that drew me in. Very early in my database career, I was having massive performance problems with my postgres install. I sent a very sharply toned message to the postgres performance mailing list. Within hours of that message, I was trading stack traces with Tom Lane. He was under no obligation to help me; in fact, I was, in hindsight, kinda a dick about it. But he did.
You probably can't buy support of that caliber, no matter how much money you throw at it.
> You probably can't buy support of that caliber, no matter how much money you throw at it.
You're underestimating how good support can be. Postgres doesn't actually have a primary sponsoring company so good (and equivalent) talent is widely available from multiple vendors.
As a Postgres fan, I do agree that the Postgres community has a more "RTFM attitude" than MySQL's. I personally don't have a problem with this, but I can understand that it might steer some people away.
Back in my SQL Server days they had something called "log shipping" which is basically sending the transaction log over the wire to another machine where it is re-run. Does that still exist?
I think warm standby is a relatively new feature for Postgres.
As someone who's had to configure warm standby recently, it's fine, but it's not active-active, which infuriated a few Oracle DBA colleagues. The best way to do that at the moment is by using a proxy like PgPool II.
Just look back at how long it too for us to get upsert, we're finally getting a better replication story in recent years.
Coincidentally, using those two features together on MySQL can have fun results:
"Because the results of INSERT ... SELECT statements depend on the ordering of rows from the SELECT and this order cannot always be guaranteed, it is possible when logging INSERT ... SELECT ON DUPLICATE KEY UPDATE statements for the master and the slave to diverge. (...) An INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe. (Bug #11765650, Bug #58637)."
You're wrong, and interestingly, you're omitting the key part:
> Such statements produce a warning in the error log when using statement-based mode and are written to the binary log using the row-based format when using MIXED mode
The problem you're writing about is not related to MySQL itself, it's a problem implicit in statement-based replication. Row/mixed mode replication has been available in MySQL since 5.1 (almost 10 years ago).
DBAs must know how replication works, as RDBMSs are not toys. It's naive to think that replication in other RDBMs itjustworks! - see, for example, the replication bug that caused slave corruption to the famous transportation company.
The choice to raise a warning rather than stopping replication is arguable, but it's far from being "insanity".
> DBAs must know how replication works, as RDBMSs are not toys. It's naive to think that replication in other RDBMs itjustworks! - see, for example, the replication bug that caused slave corruption to the famous transportation company.
I assume you mean Uber, and the replication bug they ran into on 9.2. Isn't it a bit strange comparing a bug (that got fixed promptly after it was reported) to implementing inherently unsafe replication mode?
Apologies if I confusingly replied to two posts in one, but there is common thread between the two, that is, a judgement on a product/feature, based on a radically uninformed knowledge of the subject.
I've spent a few minutes after a Google search; results follow.
Bug #58637: see https://bugs.mysql.com/bug.php?id=58637 -
this is not a bug; statement-based replication is obsolete, and it's superseded by mixed-mode - if users insist on using the former, they're asking for trouble.
If users insist? Statement-based replication remained the default until 2015, full five years after the bug was reported.
Even today, the "16.2.1 Replication Formats" page says nothing about it being obsolete, or provide indication of level of danger in choosing it. It's like walking on a minefield where the markers are camouflaged.
> An INSERT ... ON DUPLICATE KEY UPDATE statement against a table having more than one unique or primary key is also marked as unsafe.
Insanity. Also pretty much the reason I stopped using MySQL. I am not a DBA. A very small part of my job is actually interacting with the database, and I have to trust that statement execution will be consistent in all cases. That's something PostgresQL has been able to give for the past 10 years.
MySQL is still not there yet, despite being backed by a "database" company with over $30 billion in revenue per year. If Oracle, with all their domain experience and reputation incentive can't get MySQl into a sane and stable state, then its fundamentals must be built on sand.
I don't want to be bashing MySQL, but this nicely illustrates the project's approach to engineering - if there's an issue, put it into the docs as a known limitation.
Furthermore, MySQL manual actually has "Restrictions and Limitations" (which is an interesting read too), yet this "unsafe" nature of "INSERT ... ON DUPLICATE KEY UPDATE" is not mentioned there.
I'd say that if PostgreSQL learned something from MySQL, it's "Never accept feature that is incompatible with existing features, even if it's promised to be fixed soon."
From memory, they do something different for just that thread, like loading it from a different cache. I am sure that is all sorts of "wrong", but being "right" and not scaling seems worse!
Not being sarcastic: this type of article always seems to be written with an audience of "people who love to pick a side and hate the other side, but haven't picked a side yet".
The article picks an extreme example and uses insinuation and condescencion to denigrate MySQL based on one example, but it goes further than that. For example:
> Maybe its wrong to start a flame war on this
So he knows that's what he's doing, and he's okay with that.
> but I need to say, at a personal level, that this is exactly the reason why I have spent years contributing to PostgreSQL and never once contributed to MySQL.
Maybe, but if I wanted to contribute to Postgres where would I start? There's no bug tracker to look through. With the MySQL family, there is.
Also, I'm willing to bet that most people who've contributed to MySQL haven't contributed to Postgres and vice-versa, so to act as though you're doing it from some kind of principled stance rather than just practicality makes no sense to me.
This entire blog post is a giant attack ad against MySQL for the purpose of making it look bad. "MySQL left a bug unfixed for 14 years while its users' data slowly corrupted. Vote Postgres for RDBMS, and I promise to keep your auto_increment fields consistent, day in and day out."
> Not being sarcastic: this type of article always seems to be written with an audience of "people who love to pick a side and hate the other side, but haven't picked a side yet".
> The article picks an extreme example and uses insinuation and condescencion to denigrate MySQL based on one example, but it goes further than that. For example:
Yeah, I share this opinion. I may agree with some points Simon makes in the blog post, but I certainly dislike how it's presented.
> Maybe, but if I wanted to contribute to Postgres where would I start? There's no bug tracker to look through. With the MySQL family, there is.
I really doubt bug tracker is where people start their contributions. I mean, you don't go to a bug tracker, because (a) if it's broken someone else is probably already working on fixing it, and (b) the unsolved issues are rather complex and not quite suitable for new contributors.
There's actually a bunch of pages on the wiki that might help you:
I'd recommend picking a feature that matters to you and either add it (if it's missing) or improve it in some way. That's how I started contributing - I improved a bunch of stuff in the internal statistics tracking, because I needed that. I improved a bunch of performance regressions, because they were affecting the systems in our production systems. And so on.
Then start talking to people on pgsql-hackers. Send a patch, review patches from other people in the commitfests.
2nd Quadrant is a DBA-as-a-service company that does only Postgres, so it makes sense that the target audience is decision-makers trying to pick between databases who are themselves not DBAs (but probably in the position of hiring DBAs, either as normal employees or as a company to contract with).
FWIW we're not "DBA-as-a-service company". It's part of the job, but we do all sorts of stuff (support, consulting, trainings, custom development, ...)
The reality, for once, met my expectations in this article. Making imperative assumptions and backing them up with a poorly written 450 word count article that some angry (literally: "OMG sorry LMAO there is a 14,5 year old bug that was solved just now so the product must definitely be worse than the other one") developer wrote down in two minutes. That's what the title implied instantaneously.
Sometimes I don't understand the HN community. There is so much good stuff on this page, but at other times things that certainly don't deserve any attention make it to front page.
Perhaps a better title would have been: "One way in which PostgreSQL is better than MySQL". While I love Postgres, I have to admit that MySQL has its usefulness as well. I just did a side project where I started in node and postgres, realized I needed to hand it over to a guy who would want to end up hosting it on some 5 dollar a month PHP-esque plan. So, I bit the bullet, migrated the DB over to MySQL, and rewrote the backend in Laravel/PHP. It'll do just fine for his usage. My alternative was to host it on DigitalOcean, and then become the IT guy.... or perhaps spin up Heroku... but in the end, I just wanted to pass off a package and be done with it. MySQL was dead simple for my use.
In my experience writing software with generic SQL support the example shown here highlights something that's endemic in MySQL.
MySQL has a bug (eg. #199) or some deficiency (no subselects, no transactions, incorrect utf-8 support)[1]. The community's solution is a workaround (Here's how to restructure your select so that you don't need a subselect. Maybe you could use UUIDs instead of integer keys). Meanwhile the problem never existed in Postgres, which seemed to have been correctly and completely implemented from the start. After several years the issue is fixed in MySQL and the boosters ask that the score be reset and the clock re-started.
Postgres has its deficiencies and bugs but other than replication they never seemed core to the product in the way that they were for MySQL.
[1] These deficiencies span several decades and might seem unfair to bring up but I want to show that what the article is highlighting is just another round of the same problem.
This post actually makes me wonder... is there anyone that objectively still thinks MySQL is the better database of the two?
The only small advantage of MySQL I can think of (which is a great downside as well) is that it's really tolerant towards all sorts of erroneous usage and data.
The following should be a problem but are accepted by MySQL
- missing group by clauses
- all of the `ignore` commands (insert, alter table,
etc.)
- incorrect dates
Thank you, very interesting read! They do raise a few good points where MySQL outperforms PostgreSQL. I suppose that's one of the trade offs between them, data safety vs. performance.
really uber should not taken as an example, especially since they failed so hard with their databases in the past.
(I'm a Postgres fan, yes, but I think that there a valid reasons to migrate to MySQL (there are several other companies who use mysql, etc but Uber's stuff is just plain FUD))
Are you sure mysql doesn't enforce foreign key constraints? I recently did a project with mysql backend and it did threw constraint errors on invalid inputs that didn't match a parent key record.
You're partially correct. It does when inserting or updating, but if you do an alter ignore table you can create foreign key constraints that are actually invalid.
During most normal operations such as updates, inserts and deletes it will enforce them. But, opposed to other databases, you are allowed to completely ignore your foreign key constraints leaving you with a (in my opinion) possibly corrupt database.
SET FOREIGN_KEY_CHECKS = 0;
INSERT INTO ...
SET FOREIGN_KEY_CHECKS = 1;
Now you might have data in your database that's actually incorrect.
My intuition is that, if there's anything MySQL does uniquely well on the "being an ACID datastore" front, then this would let you get the best of both worlds—essentially letting some of your Postgres tables use MySQL's storage engines et al.
It's possible, but FWDs are not really a viable solution for "pluggable storage". It also limits what the planner/optimizer can do, and (obviously) it introduces latency because there's network communications.
FWD are a great thing when you need to occasionally access remote systems, but it's not a drop-in replacement for storage.
For example, how exactly would you do consistent backups?
I had hoped that this article will illustrate the scientific/technical reasons for PostgreSQL's superiority over MySQL. Alas, it just mentions that there was a bug that got fixed after 14+ years in MySQL. The author doesn't even mentions the severity of the bug and the negative effects it had. Overall, for me this article is nothing but the author just "blowing his/her own horn".
What annoys me about MySQL is that it gets simple basic SQL wrong. For example
update foo set a = b, b = a
is supposed to swap the values in columns a and b, according to the SQL standard. Logically, all of the right hand sides are evaluated using the pre-update values from the columns, and then the results are assigned.
In MySQL, it logically does the assignments left to right, not evaluating an assignment's right hand sides until it gets to that assignment, and that evaluation uses the updated values from the already evaluated assignments.
In Python terms, it is supposed to be like this:
a, b = b, a
but it is instead like this:
a = b
b = a
If we had the time, I'd love to switch to something else just to get rid of annoyances like that.
My only take away from this article was PostgreSQL doesn't use a bug tracker. I didn't even know there are large projects out there not using some tracking system for bugs/issues
> If you use PostgreSQL in the cloud or from a services company, make sure you ask them what they have contributed to PostgreSQL and what their capability is to fix bugs and with what SLA. The answer to that question is what keeps open source well funded for these latest developments and future ones too.
Reasonable sounding theory. How true is it, how significant?
Does that story describe how 2ndquadrant gets business -- other vendors have customers asking for an SLA on PostgreSQL fixes, which would require PostgreSQL experts to meet?
I'm curious about the strategic thinking behind a post like this. I'd think that the serious money that can be gained by taking marketshare from another database for a company like 2ndquadrant would be in businesses using (or contemplating) Oracle Database, not Oracle (or some version of) MySQL.
Is it just me, or was that huge bug with serious implications if you don’t know it exists? I don’t often use MySQL, maybe its well know in the community.
if you're not keeping score you're only practicing
So the main point why postgres is better is that they don't use a bug-tracking system? Definitely that makes any qualified comparison impossible of such claims as the better "time to fix" or bug responsiveness. IMHO this post is trying to sell a weakness as a strength.
FWIW that's a list of "open items" for PostgreSQL 10 release. So it's not really a "global" authoritative list of open issues, but something the project uses to make sure we deal with all known issues in the development version before the first release.
The claim that PostgreSQL does not have a bug tracker is not entirely accurate - there is no bugzilla/debbugs/...-style bugtracker (although it's a recurrent topic in discussions), but we do have pgsql-bugs mailing list. Which is mostly fine, as long as you fix bugs promptly and don't keep unsolved bugs lying around for years.
If you need to know what was fixed in a particular release, see either the release notes (high-level summary), or commits in the stable branches (those are typically just bugfixes, and the commit messages usually include a link to the relevant pgsql-bugs thread).
Wow, this surprises me a lot, especially combined with the consensus position that Postgres is good, robust software. What is Postgres doing right and how can other projects follow their lead?
On a more serious note, what makes Postgres so reliable is a number of things. For example:
* careful patch review process - Sometimes it's a bit grueling, and it takes time to get stuff in, but it's extremely valuable. Not only for finding bugs but alternative approaches to implementing the feature.
* careful testing - It's a natural part of the review process (Does the patch have tests? Can I come up with something weird that makes it fail?). Of course, there's a bunch of machines with very different configurations (OS, hw resources, ...) that are generally quite good in finding race conditions, incorrect assumptions in the code, ...
* bugfixing - Someone reported a bug on pgsql-bugs? Challenge accepted. Customer reported a bug internally? Better fix it ASAP.
Was Postgres always this way? Was the code from Berkeley abnormally stable/robust for commercial software (and extremely abnormally stable/robust for academic software), or was there an effort to clean it up to get to high quality?
That seems like a bit of a false dichotomy. Do the Postgres Devs genuinely see having a bug tracker as a "bad thing"? Or are they just stuck in their ways at this point?
A mailing list hardly seems like an efficient method for tracking bugs unless a major part of the efficiency is reducing signal to noise by make it more difficult for non-developer or non-paying users to file bugs
Well, it was not really meant entirely seriously, of course.
I can't speak for other devs contributing to the project, but I think the mailing list is a great way to do this. Ultimately, what you want is a simple way to track information about reported issues, discuss the details, make it reasonably simple to search for existing issues, and post patches fixing the issue. A mailing list does all of that, without having to deploy (and maintain and curate) a separate system.
Why do you think it's more difficult to report a bug this way, compared to a bug tracker? There's even a "report a bug" link on the project home page, where you can submit a bug report through a simple web form (you need a community account, but every other bug tracker I'm aware of requires that too).
Can you explain why you think a mailing list is less efficient than a bug tracker?
A good bug tracker makes it easier to track status and priority. The ironic thing about the blog post is that it is certainly possible there is a 14.5 year old bug in Postgres, but how would you know? Who is going to go look at 14.5 year old bug report emails and try to figure out if any of them are still open?
Contra: A 14.5 year old open issue that is genuinely untouched for 14.5 years is most likely irrelevant, either because it was swept up, has become obsolete or is some kind of request that no one cares about (otherwise it would not sit around untouched for 14.5 years).
One thing a bug tracker solves for me is, I have some time to work on things, what should I work on? (This is a different search problem from, I think I'm having this bug, has anyone else had it?)
With a mailing list, I can work on the most recent problems, and with my memory, I can work on the problems that most stand out in my memory (either because they affect me personally, or the process of solving them has attracted my background attention). Neither of those seem like good ways to prioritize things.
But with a bug tracker, I can look for old problems that have not had attention paid to them, and pay attention to them.
I certainly agree a mailing list is a good approach, and many bug-tracking systems operate essentially around mailing lists: RT adds a tracking number to the subject line, and Debbugs essentially creates one mailing list per bug.
> Why do you think it's more difficult to report a bug this way, compared to a bug tracker?
As an end-user the first thing I want to do before reporting a bug is search and see if it has already been reported or if it is being worked on and when it is expected to be released. If I was told that I had to dig through mailing list I would probably just stop before I even started.
> Can you explain why you think a mailing list is less efficient than a bug tracker?
As a bug reporter, I want a friendly way efficiently search and filter existing bugs based on status, version, subsystem, etc. I also want to be able to subscribe to a specific bug to get updates. Maybe you guys have some way to get all those features from a mailing list, but new users certainly won't have a clue.
As a developer I want to be able to easily tag, filter, assign and triage bugs. Assign priorities and associate them with releases and pull requests. I also want to be able to do high level reporting.
2017 and my dude still trying to sell consulting by applying the old emacs vs vim strategy. I usually defer to facebook using thousands of mysql and other big companies using way more instances of pgsql for day to day advice. Maybe next time, with a better blog or a matrix with some effective info noting the experience you guys have managing/developing against it. Also the patch for MySQL would be appreciated, why it did took you so long ? xoxox
What makes something better than another things is a whole host of things, not just highlighting one really bad case from the other side. I'm sure if the MySQL world wanted they could point out ways that MySQL is so much better than Postgres. Just look back at how long it too for us to get upsert, we're finally getting a better replication story in recent years. To take this one case and highlight it doesn't put the Postgres community in a positive light and it's a shame, because most I know within the community value good engineering work.
MySQL has had a lot of people that contribute to it and has some good things about it, it might be worthwhile for us to pay more attention to where they are better and just focus on improving Postgres and leaving things at that.