Anyone made the jump from MySQL to PostreSQL? It is worth it?

IMTDb · on July 16, 2019

If your project is already on MySQL and you have no issue => Don't change anything

If your project is already on MySQL and you have issues => understand the issues you are facing and make sure that moving to Postgres would fix them (99% chance it won't)

If you have a new project and have very precise informations about the constraint you will face (pretty rare) => Do your research and choose what's best for your use case

If you have a new project and have only got a vague idea of what you are going to need/face 5 years down the road => Postgres is a slightly better choice than MySQL due to the feature set. Unless you have some other reason to go for MySQL (expertise, ...)

ken · on July 16, 2019

Maybe I've had the misfortune to work at companies that were especially bad at MySQL, or maybe I'm just a Bad Programmer (actually that's probably it), but I've never worked on a project using MySQL that didn't have issues which could have been resolved by switching to Postgres.

The first time you need to run a DDL statement on live data (usually: "2.sql"), you'll be thankful for transactions. The places where Postgres shines aren't exactly uncommon situations.

closeparen · on July 17, 2019

MySQL has transactions. MyISAM doesn’t, but the apples to apples comparison would be InnoDB, right?

33degrees · on July 17, 2019

MySQL doesn't have transactions for DDL (i.e. alter table ), which Postgres does. This is very handy for migrations, as you can modify columns and update data in the same transaction and rollback if things go awry.

krageon · on July 17, 2019

MySQL has lots of features, which don't work in specific cases you will likely encounter some day. This is the case very frequently (and of course this is also the case with transactions). If nothing quite works as you expect, I'd say it falls in the category of a "newb trap". Maybe it would behove us to pick the options that have the lowest amount of those.

Finnucane · on July 16, 2019

Your database is one place where the old rule of "if it ain't broke, don't fix it" should apply.

munk-a · on July 16, 2019

I agree, for a greenfield project PostgreSQL is quite a nice option - if you're already using MySQL then probably stick with that until performance forces a change.

All that said I actually did lead a DB switch at my current gig - though it was quite a few years back when PostgreSQL had a really clear feature lead, one of the biggest quick wins was moving a terrible string field like... `1-2-3-4` over to a slightly less terrible int[] field.

avereveard · on July 16, 2019

also when you have performance issue in a db it's likely you'd get performance issue on any dB, at which point other solutions than a db migration can get more bang for buck - memcached sessions, solr searches, caching proxies, materialized pages etc.

munk-a · on July 16, 2019

Having had to discern performance issues (poorly align/absent indexes, optimization fences, etc...) I much prefer postgreSQL over mySQL especially since postgreSQL's query planner is much more advanced than mySQL's.

For highly patterned data access some of those options are quite good to investigate and invest in before getting serious about DB tuning (since, from my experience, once a company starts being serious about DB tuning it is a constant maintenance cost) but a lot of usage - especially for younger companies - are not such that caching will buy you much on the expensive components.

This comment is really conditional on a bunch of things so I wanted to clarify that I'm not disagreeing with your statement on performance directly, but I am disagreeing with it being a generalization. Performance is complicated and there is a lot to keep in mind.

avereveard · on July 16, 2019

true, I should have gone more in depth with it, the missing bit is that, assuming the db schema is not completely borked, cores and ram and fast disk on a db get you a long long way, so the tipping point for performance is often when you hit a real scaling problem as a mature company while younger company can get by just purchasing more iops from a vendor, given the current total comp of a full time dba.

of course if one want to get serious on tuning itself all the kind of analyze toolings available in and around postgres are phenomenal, I think on par with those of oracle, albeit my exp there is stuck at 12i of the old times, which makes pg my default choice for any project.

partomniscient · on July 17, 2019

But, if your database silently changes your data then the database should be considered broken.

oarabbus_ · on July 16, 2019

If YOUR life is good but you want to continue to make your analysts' life a living hell => Don't change anything

Otherwise, switch to Postgres. At the very least, for the love of god, make sure your MySQL is >=v8.0

grogenaut · on July 16, 2019

Or you know just export your data to a data lake like redshift spectrum and let the analysts pay for their usage by querying from s3 and take them out of the equation on how you run the website. There is no need for a one size fits all solution here nor a centralized approach.

oarabbus_ · on July 16, 2019

Except postgres is a terrific backend for a web app, so not sure where you're going with that.

And using Spectrum or Athena and reading directly from the data lake has it's own pros and cons and overhead costs. I wouldn't agree with your approach in most cases. Much better would be to ETL from postgres into a vanilla redshift DW or just set up a replica postgres to query from. If you're using Redshift Spectrum you should probably take a look at Snowflake as an alternative.

grogenaut · on July 16, 2019

This is in comparison to rewriting your web application from MySQL to pgsql so your analysts can work.

And I'm using spectrum as a data lake solution generically. If you want to run snowflake or some other thing is dependent on if you are in aws, gke, bare metal, etc and if you want to host or not. I prefer not to have to manage hosting and I run in AWS.

The important part is you don't need to tie your web app database to all of your needs. Separation of concerns is the important part and independent scaling for independent needs.

krageon · on July 17, 2019

Exporting the data analysts want to a jurisdiction that has very bad privacy safeguards is unethical. Even if you're not exporting it between jurisdictions, exporting it to a separate company should not be a step that is easy to take for you.

sslalready · on July 16, 2019

Care to expand on why MySQL < 8 will be problematic for analysts? Anything else than the lack of window function?

oarabbus_ · on July 16, 2019

IMO lack of window functions alone are a death knell.

CTEs/WITH clause is the other massive one - almost as important as window functions.

Additionally, EXCEPT/INTERSECT support, better EXPLAIN (it's still shit though), much improved REGEXP and JSON support.

paulryanrogers · on July 16, 2019

Possibly JSON operators

jimktrains2 · on July 16, 2019

Window functions.

batbomb · on July 16, 2019

There's some caveats there.

If you are just running a web app, then this is the correct answer. If your business is in your database, you employ data analysts and want to gleam some additional insights from your database, then it might be worth a look, though most people who chose MySQL did so because they were building a web applications and don't employ analysts.

toxik · on July 16, 2019

I think MySQL smells like poor engineering, and PostgreSQL like good engineering. That’s why my gut says PostgreSQL is the better choice, for the myriad issues you won’t face. It’s like Python vs PHP, in some ways.

crystaln · on July 16, 2019

And like PHP an extraordinary amount of good engineering has poured in after bad, resulting in a quite workable even good products with technical debt. I might also make the comparison to MS-DOS / Windows.

Not sure why this is downvoted.

wbl · on July 17, 2019

With sufficient thrust pigs fly, but that doesn't mean you should totally ignore aerodynamics when making a plane.

0815test · on July 16, 2019

> ...It’s like Python vs PHP, in some ways.

Uh, did you notice where this was posted? I don't exactly disagree, but it's kinda ironic to raise that point.

darkpuma · on July 16, 2019

How is that ironic? Because HN was written in Arc, denigrating PHP is ironic? I don't get it.

Edit: Oh, because the article is on the PHP subreddit. To be honest I file PHP developers using MySQL under expected, not ironic.

reallydude · on July 16, 2019

> I think MySQL smells like poor engineering, and PostgreSQL like good engineering

> It’s like Python vs PHP, in some ways.

Not just the ordering of the characterization, which would paint PHP in the positive light you clearly think is undeserving. SMH

darkpuma · on July 17, 2019

It seems far more likely that the GP was imprecise in their analogy's ordering, rather than slamming python and praising PHP.

brightball · on July 16, 2019

I'm a huge PG fan but switching your database on an existing project is a non-small undertaking.

If you're starting from scratch, then yes, you take PG every time. There's a lot...a lot that goes into saying that and if you haven't made the jump, a lot of the reasons aren't going to seem important because you currently "don't do" the things that PG lets you do with your data.

Like this: https://www.brightball.com/articles/why-should-you-learn-pos...

So the surface level stuff that you're experiencing with your database is going to be MySQL relative.

Probably the one that connects the most with the people who are working with MySQL is that you no longer have to worry about TEXT, MEDIUMTEXT, LONGTEXT.

Without really thinking about it, most people and ORM's just use TEXT and every now and then you'll get some data to put into the field that doesn't fit...and it won't generate an error it will just truncate the data. You'll end up needing to go and increase the field size wherever the problem happens (and then hope it doesn't happen again).

With PostgreSQL it's just TEXT. Done. Doesn't matter how big or small it is, it will fit the whole document. If the data is larger than 8kb, it will be transparently compressed with TOAST.

I've tested by dropping in a 2mb XML document and it stored as 80kb.

That's just a tiny piece though.

marcosdumay · on July 16, 2019

That. PG won't silently truncate your texts. It also won't accept 0000-00-00 dates.

MySQL has enough data corruption bugs to last for a lifetime. It is also maintained by Oracle, what means the bugcount is likely going up instead of down.

paulryanrogers · on July 16, 2019

FWIW, there are newer defaults and modes that help with old quirks.

Sadly MySQL still lacks TIMESTAMP WITH TIME ZONE.

barrkel · on July 16, 2019

Text column type is not where I'd advertise pg differences. Rather a much richer SQL query feature set and a smarter (though sometimes smartass) query optimizer.

MySQL is dumb (though predictably dumb) and queries are single threaded, and the feature set is limited. Some queries have no efficient representation, and you need to trade off cost of materialising derived table vs repeated correlated queries.

brightball · on July 16, 2019

Agreed. For sake of OP I was going with a known pain point in MySQL.

zbentley · on July 16, 2019

That, and the horror that is UTF handling on MySQL.

Cymen · on July 16, 2019

I have on all my personal projects but I wouldn't bother with an existing project unless I had a specific need that PostgreSQL met (and MySQL did not).

I'm actually all in on PostgreSQL at this point -- I'm not bothering with trying to use vanilla SQL and worrying about switching databases but I have that luxury on my side projects. PostgreSQL has a lot of interesting features that I am happy to use and I frankly never really relied on using vanilla SQL in order to be database server agnostic anyway (not saying that doesn't have a place and maybe someone will reply with how it was absolutely essential for their X but...).

To be clear, I still am all for portable SQL queries but I no longer worry about relying on a feature only PostgreSQL offers and/or taking advantage of things it offers that are not part of the standard.

h1d · on July 16, 2019

Are you talking about ORM? ORM is such a waste of time to learn as you switch to another language at one point, your time spent on ORM is gone and you don't even learn the underlying tech that is SQL.

Stick with SQL and you can keep that skill mostly the same across different SQL products. And obviously you will sooner or later hit some performance penalty when you start writing complex ORM and ORM starts spitting out SQL without thinking and then your only solution is to complain on GitHub issue as you can't solve it.

matthewmacleod · on July 16, 2019

This is bad advice. An ORM is a useful tool for smoothing over the generation of queries and their conversion into object graphs, and make sense for many use cases.

It is true that using them effectively still requires knowing the underlying database technology, but that is a given - they are not meant to supplant that knowledge.

I will say though that every time I have encountered a codebase developed by a fanatically anti-ORM developer, it has contained a bug-riddled implementation of a half of an application-specific ORM anyway…

jerf · on July 16, 2019

"An ORM is a useful tool for smoothing over the generation of queries and their conversion into object graphs, and make sense for many use cases."

I'm finding it's more powerful to have some language convenient query generator + query -> basic data structure + basic data structure -> final data structure broken into three separate parts, instead of bound together monolithically as most ORMs do it. Those steps are all useful on their own merits, but it's really quite frequent that for some specific task the prepackaged monolithic ORM is not what I need; either I need to tweak the query, or I want to get a basic data structure from some other source (JSON or something) and want to be able to reuse the logic, or I need a tweak to how I'm processing it down to an object (e.g., processing it into a class that is standalone vs. one that is integrated with the rest of the world, the "banana vs. a jungle containing a gorilla holding a banana" sort of thing Joe Armstrong talked about), or I want to create data structures from a complex query involving joins, aggregates, etc. that I can't necessarily express as "a class that represents a table" or something.

It's not the three tools that are the problem; it's the inflexibility of jamming them all together in one shot, along with the semantic contradictions encountered while trying to make one class be the query generator, the basic data structure representing the query, and the "final" data structure, all at once, even before we consider the affordance of the ORM that really, really encourages you to make DB structures to OO classes instead of queries.

rlpb · on July 16, 2019

Have you looked into sqlalchemy? I think it would meet some of your requirements; I'm not sure about the rest.

jerf · on July 16, 2019

It is one of my favorites, yes, but alas, only works when I'm in Python, which for me personally is not very often. YMMV.

pmontra · on July 16, 2019

I'm working at three projects with three languages and three different ORMs so you have my sympathy. They're the ORMs of Django, Rails, Phoenix. Rails' got the easiest ORM to use by far, maybe because IMHO it's now the closest to SQL. Phoenix's Ecto is needlessly complicated and Django's is as verbose as Python's libraries can get. Example: Model.objects.get(), because nobody could understand Model.get(), right? /s

So, along the years I felt sometimes like I have to relearn how to do SQL in arbitrary library X, for the sake of it.

However ORMs have an advantage: no need to change queries when adding/removing fields to the database. No need to deserialize data from resultsets. The usually migrate the database either by applying migrations to the database and infer models (Rails) or syncing the db with the model (Django). Weirdly Ecto forces the developer to both write the migration and the model.

My ideal ORM would be something that lets me write

  sql("select * from employee join department
    on employee.department_id = department.id
    order by employee.id desc limit 10").each do |record|
   
    puts("#{record.employee.id}, #{record.department.name}")

I don't know how that would play with static typed languages but if all ORMs would be like that we could know only SQL and be able to perform arbitrarily complex queries.

status_quo69 · on July 16, 2019

Sounds like you don't want an ORM but a DB driver like psycopg2 http://initd.org/psycopg/docs/cursor.html#fetch

pmontra · on July 16, 2019

I know psycopg2 and other drivers for several languages. They're ok to make queries but they lack everything else ORMs are good at.

I want an ORM, but a smarter one and as similar as possible across languages. Instead everybody reinvented the wheel in every language, with different approaches, more or less over engineering their solution and making the polyglot experience as difficult as learning English and Russian instead of Italian, French, Portuguese, Spanish (learn one, you'll get the others quickly.)

My background is that I learned SQL almost 10 years before the first Java ORMs went mainstream. I've been using ORMs since the 90s but for writing queries they're a kind of waste: the query in the example in Rails is

  Employee.joins(:departments).
    limit(10).
    order_by("employees.id desc")

at the cost of

1) defining some relationships between the tables in the models

2) learning how to translate SQL in Ruby

ActiveRecord is magically convenient most of the time. Sometimes it's quicker to write that SQL, sometimes is not. Django (I can't remember the name of the ORM) and Ecto are nearly always slower to write than SQL, especially Ecto.

For complex queries SQL is the only way to go in any framework and the ORM is there only to deserialize the results, so why not using it for simple ones too?

  Employee.find(params["id"])

is about the limit.

slig · on July 16, 2019

> Example: Model.objects.get(), because nobody could understand Model.get(), right? /s

That has a simple explanation: each Model can have multiple Managers [1], and the `objects` is the default one. Also, by convention, methods on the Model are usually relative to "one DB record", and methods on the manager are relative to N records.

[1]: https://docs.djangoproject.com/en/2.2/topics/db/managers/

pmontra · on July 16, 2019

Thank you, I know now why Django inflicts that .objects method to everybody :-)

I'd rather design a less verbose API for the default manager

  Model.get()
  Model.filter()
  ...

and Model.manager.get() if one really needs custom managers.

The fact that I didn't know about managers after years of Django (and nobody told me) should be telling of me and my team, of course, but also how needed managers are. Having to go through that API without shortcuts is not nice to developers.

slig · on July 16, 2019

You're welcome. I'm probably biased because Django has been my framework of choice for more than 10 years, but I really like the distinction between methods relative to the Model and methods that act on multiple records and the `objects` thing doesn't bother me.

Beltiras · on July 16, 2019

I've used the Django ORM professionally for 8 years. Back when I started migrations came in the form of the South app. I can remember one time where the ORM spat out SQL and that was a migration where I botched something in the index of a column. I realized immediately, fixed the error and was done with it. I never once saw anything on Sentry with an SQL error. This is anecdotal and may speak to the genius of Andrew Godwin (and others on the Django team) but it is a data point.

beatgammit · on July 17, 2019

I used to hate ORMs, but then I found Diesel for Rust, which checks that structs match the database schema, which is really nice when building basic CRUD apps. I ditch it when doing more complex work, but for simple tasks, it saves a lot of time and catches some errors at compile time that might not be caught as quickly at runtime.

Nothing beats knowing the underlying database, but an ORM certainly has its place.

bhouston · on July 16, 2019

We use PostreSQL. We ran into the issue with Google Cloud SQL (PostreSQL-flavored) that we can not make it multi-region, only multi-zone. With Google Cloud SQL (MySQL-flavored) you can do multi-region.

I would feel much better if we had multi-region failover with PostreSQL just in case a Google Data Center region (which contains the multi-zones) goes completely down (as it did in early June 2019.)

We have some manual solutions to this, but I would have prefered one that just worked with Google Cloud SQL out of the box. I think I am not the only one in this situation with this problem.

bdcravens · on July 16, 2019

AWS Aurora also lags Postgresql behind its MySQL offerings.

bobviolier · on July 16, 2019

You also can't do multi-region on Google Cloud SQL (MySQL). The only thing you can do is run your replica on your own VM and configure it to read from the Master in Google Cloud SQL.

(Or I am completely missing something)

laurencerowe · on July 16, 2019

> Cloud SQL for PostgreSQL does not yet support replication from an external master or external replicas for Cloud SQL instances.

https://cloud.google.com/sql/docs/postgres/replication/

merb · on July 16, 2019

you can't do multi-region failover with mysql either. btw. multi-region postgres is also not that easy either.

i think the easiest solution would probably to maintain multiple clusters and put something in front of them, like pgpool and write to them simulatiously.

if one write fails rollback all clusters. and hope the read replica of cluster x will take over. probably something like pgpool + postgres cluster on k8s(zalando, crunchy).

at the moment there is no ledger that can span multiple regions. so your on your own for that.

if people really need multi-region they should probably just buy citus data.

Glyptodon · on July 16, 2019

I think if you're starting a project the amount of "gotcha" type stuff you'll hit in MySQL/Maria if you aren't already familiar with it is pretty significant compared to PostgreSQL. PostgreSQL is also less fragmented, which seems to be an increasing issue with the MySQL family.

But if you're already pretty used to MySQL and it's working fine in your project already, I would only switch to PostgreSQL if you think specific features would be very worthwhile.

It might be worth comparing features. I have a project that benefits enormously in terms of reduced complexity from the PostgreSQL array types, but that's somewhat rare I think.

Other things that might be useful if you have special/specific needs are PostGIS, table inheritance (not really a huge thing), and foreign data wrappers.

But really you shouldn't switch because you "might" need one of them.

tonyarkles · on July 16, 2019

So there’s some nice things about Postgres, but by far the thing I miss the most when I go back to other systems (incl. MySQL/Maria) is transactional DDL. Knowing that a bad migration isn’t going to leave my DB in a half-changed fucked up state is worth it.

pmontra · on July 16, 2019

This and window functions and safety of data. I'm using MySQL and PostgreSQL in different projects (customers choice) and sometimes there are things I can't do in the MySQL one or would be much more complicated.

Sometimes MySQL silently slips bad data in the db because it truncates strings that don't fit varchars or those impossible 0000-00-00 00:00:00 dates. To be fair, the latter are not much of a problem.

I'd go PostgreSQL all the time but I'd be very wary about migrating a production database. It could be long and risky.

cwyers · on July 16, 2019

MySQL now has window functions:

https://mysqlserverteam.com/mysql-8-0-2-introducing-window-f...

And MySQL has strict mode, which I believe is enabled by default on newer versions but available in any 5.x version:

https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mo...

zaphar · on July 16, 2019

Strict mode is like 1/4'th of the safety that PostgreSQL gives you. Robust types for columns. Safe transaction semantics. Better tools for enforcing data consistency.

MySQL has a long way to go before it can give you the same security for you data that PostgreSQL does.

tonyarkles · on July 16, 2019

> very wary about migrating a production database. It could be long and risky.

Heh, I've only done a few migrations from MySQL to PostgreSQL, and in every case the hardest part has been dealing with bad data in the source DB. It's always been pretty illuminating for the client when they discover how much bad data they actually have in the DB, even though their "ORM Validations" were keeping it "clean" to make up for MySQL's shortfalls.

weberc2 · on July 16, 2019

shrimpx · on July 16, 2019

DDL is the sub-language of SQL which alters schema. Eg create table, alter table, drop table are DDL commands.

buckminster · on July 16, 2019

Data definition language - the SQL for defining and updating tables and so on.

You want it transactional to enable rollbacks.

molsson · on July 16, 2019

I recently migrated a small side project from MySQL to PostgreSQL. I wrote a short blog post about how I did the migration, issues I ran into etc: http://blog.minimum.se/2019/07/09/upgrading-node-mysql-to-po...

I'm overall very happy with the migration and PostgreSQL also ended up taking less memory (RSS) in prod, which was a nice bonus (this is useful for me because I'm running this on a low-end Digital Ocean droplet).

outside1234 · on July 16, 2019

The best part of moving to PostgreSQL is that it is sort of the defacto standard wire protocol for relational databases now and that enables you to "grow into" something like CockroachDB down the line when you need that sort of scale.

collyw · on July 16, 2019

Can you explain your comment in more detail as I am not really sure what you getting at with "the defacto standard wire protocol for relational databases" nor "grow into" something like CockroachDB".

ansible · on July 16, 2019

In the case of CockroachDB (a distributed database), it has the same wire-level protocol as Postgresql. So if your programming language has a Postgresql driver (and it likely does), then you can also talk to a CDB database as well.

The SQL dialects are not the same, though they are similar, so you will have to go through your code to see if you need to fix anything. Moving the data itself over is fairly straight-forward:

https://www.cockroachlabs.com/docs/stable/migrate-from-postg...

collyw · on July 19, 2019

Ok, so the "driver protocol" would be another way of saying that.

mey · on July 16, 2019

Not parent. No idea about the defacto standard comment. CockroachDB client protocol is* compatible with PostgreSQL. Edit: as well as SQL syntax, that's the important bit.

Meaning you could in theory mirror data to CockroachDB and the point your clients to it instead of PostgreSQL. CockroachDB is aiming to solve horizontal scaling, hence the "grow" comment.

leandrod · on July 16, 2019

PostgreSQL underlies at least a plurality of SQL DBMS developments nowadays, specially for scale-out (distributed) projects.

wgjordan · on July 16, 2019

Any sources on that assertion? A quick bit of Google searching [1] shows me only ~10% market share for PostgreSQL, compared to ~46% for MySQL.

[1] https://www.datanyze.com/market-share/databases/postgresql-v...

martin_marques · on July 17, 2019

That's a joke right? I mean, you don't measure market share by looking at how many "websites" use one or the other database.

provolone · on July 17, 2019

because wordpress

olavgg · on July 16, 2019

Yes one time, I joined a startup that was 3 months old that had used MySQL, a lot of the queries had manual joins that could be solved with CTE/recursion. We also had good use for function indexes, partial indexes, materialized views.

Is it worth it? In our case, BIG YES. But most of the times I don't think it would be. Migrating data isn't that big of a job, but rewriting the queries are. If you use an ORM for that majority of the queries the switch may be simple.

Beltiras · on July 16, 2019

I used MySQL till about 2012. Inherited an intricate Postgres database at a job and found lots of ways to tweak it that were not as easy in MySQL. I would only touch MySQL today if I'm not working on a greenfield.

cwyers · on July 16, 2019

This strikes me as bike shedding. If you have a specific problem that you believe will be solved by the migration, do it. Otherwise, users will benefit more if you focus on changes based on their value, not what part of your tech stack looks better on Reddit.

kd5bjo · on July 16, 2019

If you have a specific problem that you believe will be solved by the migration, seriously investigate the possibility of migrating. This is a big change, and it's likely to both solve and create problems; take it slow and steady, and keep your abort options open as long as possible.

cwyers · on July 16, 2019

This is a fair response.

hrktb · on July 16, 2019

It is bike-shedding in a way, but I think it’s also a direction PHP is taking (becoming more strict, more rigid, less “hacky”).

Switching to Postgres follows the same philosophy I think. Migrating existing services is usually a stupid thing to do, but anecdotally I know a number of PHP shops adding pg to their infra and building the new services there.

I’d compare it to the switch to unicode everywhere or normalizing dates on UTC.

cwyers · on July 16, 2019

I think a lot of people have a mental model of MySQL that's stuck in the 4.x days. You can get most of these proposed benefits by "just" upgrading to the latest MySQL and making sure that you're running it in strict mode.

halis · on July 16, 2019

Migrating from Oracle to Postgres is much more appealing because you can avoid the high Oracle license costs.

I don't really see that much benefit in converting from MySQL to Postgres.

While Postgres is a legit database, so is MySQL. If you're looking to scale MySQL, I recommend the book High Performance MySQL.

For greenfield projects I'd be comfortable with either MySQL or Postgres.

bitlax · on July 16, 2019

Here's some discussion from when Uber switched back from PostgreSQL to MySQL:

https://news.ycombinator.com/item?id=12166585

FWIW I prefer PostgreSQL.

luhn · on July 16, 2019

The linked article is a good dive into the internals of Postgres vs MySQL.

However, Uber didn't really switch from Postgres to MySQL, they switched from Postgres to their home rolled document database which uses MySQL as the storage layer. [1]

[1] https://eng.uber.com/schemaless-part-one/

zten · on July 16, 2019

Markus Winand's write-up at https://use-the-index-luke.com/blog/2016-07-29/on-ubers-choi... really drives this point home. InnoDB has favorable characteristics for what they're trying to do.

tluyben2 · on July 16, 2019

I finished our very niche ORM for PostgreSQL from 15 years of MySQL only and must say that I have been able to do much more ‘peephole optimising’ (not sure if it is applicable but I mean very specific, very postgres specific optimisations vs what I could do in mysql) and especially for bigger datasets with more ad hoc querying PostgreSQL has helped us. Also, I like the codebase better; I did lot of work (proprietary) on mysql internals and I found it harder to work with. Maybe that changed but I guess not.

Xamayon · on July 16, 2019

As much as I like PostgreSQL's features, MySQL does have advantages in some situations.

The copy on write/mvcc type stuff in particular has bit me hard in the past. It's great in most cases but can cause some serious issues with huge tables where every row is updated many times. Even with frequent table cleanup/analyze/etc, the tables bloat up and can become completely unworkable whereas in MySQL there are no problems at all. Symptoms are things like the table sizes growing hundreds of GB larger than the data, queries which should use indexes becoming table scans, etc. The existence of multiple versions of a row seems to really upset the query planner in some cases. I have a table which takes several hours to do a basic conditional count in PostgreSQL, which will finish in milliseconds on MySQL.

paulryanrogers · on July 16, 2019

MySQL's InnoDB is MVCC as well, at least when used with transaction isolation.

That said, MySQL does have more mature replication options built-in.

Xamayon · on July 16, 2019

I wasn't aware of that, but given the transaction capabilities, it makes sense. I primarily use InnoDB and have never had any similar performance issues, so I havn't looked into it heavily...

MySQL InnoDB tables handle the use case much much better, so just guessing but it's probably a difference in how the indexes handle updates/multiple versions and how the query planner works for dirty indexes.

HugThem · on July 16, 2019

If you don't need transactions, then MySQL with the MyISAM engine has several aspects I love:

1: Blazingly fast. As it simply hands DB writes to the OS and benefits from all the caching and optimization the OS does. Depending on your storage medium, insert/updates/deletes often are 100x faster then with an ACID compliant engine.

2: Simple data structure. Want to see the sizes of your tables? "ls /var/lib/mysql/your_database" and you see each table as two files. One for the data, one for the indexes.

3: Easy to copy the data "cp -r /var/lib/mysql/your_database somewhere" and you are done.

xemdetia · on July 16, 2019

As long as your application understands https://dev.mysql.com/doc/refman/8.0/en/internal-locking.htm... then MyISAM may work for you, it just should be relegated to the second tier option especially for new development. I feel like I have seen so many people overestimate the benefits of MyISAM when they later realize that they can't cleanly write all of the information in a single append unit and have to touch values after the first insert.

I generally hope people stop using it as a starting point and instead relegate it to a particular optimization crutch like in-memory tables.

dijit · on July 16, 2019

I've done it twice; the context is completely different on why though.

Job #1, we were a heavy PHP/MySQL shop with a smattering of Ruby. I had come from a PG/MySQL shop and knew the merits and pitfalls of both databases (PGs auto vacuum being a particular point of contention at my job before this one), and informed the CIO that going to PG instead of following the upgrade path to MySQL $latest had a number of minor improvements and since our SQL was /mostly/ standard on the frontend we could attempt the change.

I was in charge of testing and automating deployments of the test environments, so I set about massaging the data into PostgreSQL-types. However, we found that our data had been being silently corrupted over a long period of time. Dates were often very incorrect, values of not null columns were filled with the "empty" of whatever it was (0 for int, "" for string etc).

We managed to get the data into postgresql and a lot of our tools stopped working, the commerce site itself was working thanks to the efforts of two really committed developers, but the back office tooling was not ported. Due to the sheer amount of queries we had which were running silent failure modes (which postgresql does not allow to run) we ended up going back to mysql rather than fixing them.

I don't work there anymore.

Job #2, We were a heavy C++/MySQL shop, MySQL just because it's what everyone knows, and because everyone knows it, it becomes a standard that is hard to move away from. I was tasked with getting 50KiB~ binary data blobs to be saved with a high update frequency, strong consistency and ideally alongside its locks.

We tried to coerce mysql to do the right thing for a very long time, we rented this huge iron server which had 42 directly attached SSD's in RAID0 (to test the limits of the software as a control, not for prod-like environments), 60 CPU threads, one of the best raid cards you can buy etc;

What we found was that MySQL couldn't climb above 600qps for this workload, no matter what you did, we even ended up recompiling the kernel with a faster implementation of sockets and ipc, and changing the memory allocator mysql used (which was unsafe), but we couldn't get it to move much faster than that.

So, on profiling we found that it was spinning on an internal lock, something we couldn't engineer around.

600qps was well below our target, so we started looking at alternatives. It should be noted that one of the requirements was that the data be fsync'd before returning OK to the application. I say that because 99% of noSQL databases only ensure that their data is in VFS.

We tried postgresql as our only non-noSQL test, since it's similar enough to mysql's syntax and easy to test. The C++ developers fell in love, instantly with the C++ SDK for pgsql, instantly referring to the mysql equivalent as "braindead", and the database scaled linearly with I/O, even as we added a additional arrays of drives eventually saturating our 40GBit NIC.

Now we run postgresql as our only RDBMs.

I still work there.

xemdetia · on July 16, 2019

Both of these cases are the exact same reasons why I end up finding MySQL to be concerning especially going forward in time. There are just some use cases that in the current MySQL design with regards to locking that I feel like will never be fixed now that MySQL feels like it's headed further and further towards splintering as an ecosystem. These are the kind of extremely hard, change the major version number level issues, and with the MariaDB/MySQL/cloud MySQL-compatible frontend ecosystem I'm not entirely sure these things will ever get fixed.

867567838694 · on July 16, 2019

Everyone's needs are different but everything from partial and expression indexes to JSON and JSONB fields. I switched over to Postgres 100% some years ago and never went back. I find it to be much nicer to work with as a founder of a tech company who had to wear both the developer and dev-ops hat for some time. I'm not familiar with the Google cloud flavor, but we're building out with Postgres in development and staging and using Amazon's Aurora (Postgres compatible) in production.

CrasVestibulum · on July 16, 2019

A very trivial thing (having used both PostgreSQL and MySQL in production for many years) , I find Mysql's EXPLAIN query much easier to grok than the postgresql's query plan.

nathan_long · on July 16, 2019

I probably wouldn't migrate a project, but I switched my default db to PostgreSQL after seeing the many ways MySQL can silently screw up data. Truncating strings, truncating numbers, allowing NULLs in a NOT NULL, etc. Much (all?) of this can be fixed with settings these days: https://dev.mysql.com/doc/refman/5.7/en/constraint-invalid-d...

PostgreSQL has a hard bias toward correctness. If you give it invalid data, it will blow up loudly every time.

PostgreSQL also has a boatload of useful features. Need full-text search? It has it, and it's good enough for lots of stuff: http://rachbelaid.com/postgres-full-text-search-is-good-enou...

Need to search for locations with coordinates < N kilometers from a coastline? PostGIS does it.

Need to store and query JSON? Partition tables by a date range? Index only the rows matching a WHERE condition? Use a CHECK constraint? Prevent race conditions from creating reservation rows with overlapping datetime ranges (exclusion constraints)? Run a query periodically and cache the results as a queryable table (materialized views)? The list goes on and on.

More correct data + fewer reasons to add another tool to the stack - especially when that would mean having to keep data in sync - is pretty compelling to me.

ars · on July 16, 2019

I'm made the jump, it's not better exactly, it just has a different set of problems.

For example there is no unsigned or tinyint, advisory locks in PG are much worse than those in MySQL, true and false "t", "f" instead of "1", "0", and some other things.

Yes, each item can be dealt with - but it's like I said, it's just a different set of problems.

yellowapple · on July 16, 2019

Not being at the mercy of Oracle's whims is enough justification to consider migrating to MariaDB at the very least. MySQL is (miraculously) still alive for now, but it wouldn't be the first case of Oracle letting one of its acquisitions wither and die a slow, painful death.

xs83 · on July 17, 2019

Recently did this as MySQL was just not handling large tables as well we needed and their new partitioning and clustering were lackluster for our purposes.

Without even implementing the partitioning and clustering - simply moving to Postgres has "reset" our performance problems on our system. We have tables with 10M+ rows in them that dragged on MySQL (Aurora RDS) - with Postgres (Aurora Postgres) things are running smoothly.

If you are dealing with huge amounts of data but not quite into "BigData" territory - I say go for it.

I just hope you have used an ORM or your queries are generalist enough to make the switch easy - even with an ORM it was a painful migration for us!

SenHeng · on July 17, 2019

One of the biggest reasons I’m still using MySQL is because of Sequel Pro. Whoever running PostgreSQL would do themselves a service to build a GUI of similar quality and UX polish.

https://www.sequelpro.com/

redis_mlc · on July 16, 2019

As a DBA who works with both MySQL and Postgres, I strongly prefer managing recent versions of MySQL for large SaaS systems.

Postgres has significant issues with vacuum and index write amplification, as noted in the Uber blog post a few years ago - nothing has changed.

Additionally, there is no clear compliance story regarding schema configuration, grants and restricting COPY with Postgres, while those are no-brainers with MySQL.

If you're an AWS user, then MySQL has more advanced managed options as of 2019.

In summary, Postgres is fine if you don't have paying users yet - any database would do fine.

But there's a reason Google and Facebook have stayed with MySQL, and that's because it is manageable at scale whereas Postgres simply isn't.

On the other hand, if you want to use Vertica (derived from the Postgres code base) as a column-store database, then managing it is the same as Postgres.

h1d · on July 16, 2019

There were some comments regarding Postgres and MySQL in a Postgres release post before.

https://news.ycombinator.com/item?id=19872666

jsilence · on July 16, 2019

Isn't that about just switching the db driver in the ORM?

Sorry for being sarcastic, but was that not exactly the point of countless database abstraction layer projects?

e12e · on July 16, 2019

Yes and no. The point of an actual ORM (object relational mapper) is to be able to the equivalent of lisp/smalltalk (runtime) image level perstiance: make objects permanent between restarts and changes transactional.

To really get there, you actually need a object db, like zoodb[z] for python (which incidentally can use pg as a storage back-end).

If you're using a db abstraction layer, and a typical active record like surrogate keys - then for simple use cases you can indeed switch back-ends; many rails projects run tests on sqlite and production on postgresql.

That's not a great idea, but it works for simple projects (first gotcha; most rdbms don't have a global namspace for indexes; sqlite does).

But there's no standard AFAIK for full text indexes, geodata, json/bson, materialized views... So it depends a lot on how much of the db you're actually using.

[z] http://www.zodb.org/en/latest/tutorial.html

Note that an object db does not free you from thinking about data structure migrations:

http://www.zodb.org/en/latest/guide/writing-persistent-objec...

https://pypi.org/project/zope.generations/

deusex_ · on July 16, 2019

Uh, no? The idea is to have a full featured object mapping layer that can work with multiple database backends.

Doing mapping of db result sets to object is non trivial, no need to do develop it multiple times for each RDMBS when 90% of the code would overlap.

I'd work on the sarcastic attitude though

RantyDave · on July 16, 2019

Yes and no. You still need to ETL the data from one db to the other and (quite probably) the SQL dialects will be different so doing a dump'n'load isn't going to work.

HelloNurse · on July 17, 2019

For a real life metaphor, changing address in order to receive mail is easy; moving to a new house is the complicated and expensive part of the change.

Shorel · on July 16, 2019

And they only work that way if you switch the DB driver frequently during the development of your product.

The abstraction is useful. The abstraction is leaky.

dano · on July 17, 2019

Moved my prior company from MySQL to Postgres in 2004 and never looked back. The stability across time and through upgrades was remarkable.

throw0101a · on July 16, 2019

Meta:

Thank you for linking to old.reddit.com. Personally I really hate the new interface at www.reddit.com.

imhoguy · on July 16, 2019

If you need specific PostgreSQL features then yes, otherwise switch to MariaDB instead.

nsxwolf · on July 16, 2019

I was a bit bummed when I started my new gig at a MySQL shop after spending a few years enjoying all the new recent features of PostgreSQL. But then we switched to MariaDB and it basically has everything.

Definitely give it a look before considering switching.

alexnewman · on July 16, 2019

I like them both depending on the use case

alexseman · on July 16, 2019

Why not Percona?

obrajesse · on July 16, 2019