Hacker News new | past | comments | ask | show | jobs | submit login
Anyone made the jump from MySQL to PostreSQL? It is worth it? (reddit.com)
152 points by hu3 on July 16, 2019 | hide | past | favorite | 113 comments



If your project is already on MySQL and you have no issue => Don't change anything

If your project is already on MySQL and you have issues => understand the issues you are facing and make sure that moving to Postgres would fix them (99% chance it won't)

If you have a new project and have very precise informations about the constraint you will face (pretty rare) => Do your research and choose what's best for your use case

If you have a new project and have only got a vague idea of what you are going to need/face 5 years down the road => Postgres is a slightly better choice than MySQL due to the feature set. Unless you have some other reason to go for MySQL (expertise, ...)


Maybe I've had the misfortune to work at companies that were especially bad at MySQL, or maybe I'm just a Bad Programmer (actually that's probably it), but I've never worked on a project using MySQL that didn't have issues which could have been resolved by switching to Postgres.

The first time you need to run a DDL statement on live data (usually: "2.sql"), you'll be thankful for transactions. The places where Postgres shines aren't exactly uncommon situations.


MySQL has transactions. MyISAM doesn’t, but the apples to apples comparison would be InnoDB, right?


MySQL doesn't have transactions for DDL (i.e. alter table ), which Postgres does. This is very handy for migrations, as you can modify columns and update data in the same transaction and rollback if things go awry.


MySQL has lots of features, which don't work in specific cases you will likely encounter some day. This is the case very frequently (and of course this is also the case with transactions). If nothing quite works as you expect, I'd say it falls in the category of a "newb trap". Maybe it would behove us to pick the options that have the lowest amount of those.


Your database is one place where the old rule of "if it ain't broke, don't fix it" should apply.


I agree, for a greenfield project PostgreSQL is quite a nice option - if you're already using MySQL then probably stick with that until performance forces a change.

All that said I actually did lead a DB switch at my current gig - though it was quite a few years back when PostgreSQL had a really clear feature lead, one of the biggest quick wins was moving a terrible string field like... `1-2-3-4` over to a slightly less terrible int[] field.


also when you have performance issue in a db it's likely you'd get performance issue on any dB, at which point other solutions than a db migration can get more bang for buck - memcached sessions, solr searches, caching proxies, materialized pages etc.


Having had to discern performance issues (poorly align/absent indexes, optimization fences, etc...) I much prefer postgreSQL over mySQL especially since postgreSQL's query planner is much more advanced than mySQL's.

For highly patterned data access some of those options are quite good to investigate and invest in before getting serious about DB tuning (since, from my experience, once a company starts being serious about DB tuning it is a constant maintenance cost) but a lot of usage - especially for younger companies - are not such that caching will buy you much on the expensive components.

This comment is really conditional on a bunch of things so I wanted to clarify that I'm not disagreeing with your statement on performance directly, but I am disagreeing with it being a generalization. Performance is complicated and there is a lot to keep in mind.


true, I should have gone more in depth with it, the missing bit is that, assuming the db schema is not completely borked, cores and ram and fast disk on a db get you a long long way, so the tipping point for performance is often when you hit a real scaling problem as a mature company while younger company can get by just purchasing more iops from a vendor, given the current total comp of a full time dba.

of course if one want to get serious on tuning itself all the kind of analyze toolings available in and around postgres are phenomenal, I think on par with those of oracle, albeit my exp there is stuck at 12i of the old times, which makes pg my default choice for any project.


But, if your database silently changes your data then the database should be considered broken.


If YOUR life is good but you want to continue to make your analysts' life a living hell => Don't change anything

Otherwise, switch to Postgres. At the very least, for the love of god, make sure your MySQL is >=v8.0


Or you know just export your data to a data lake like redshift spectrum and let the analysts pay for their usage by querying from s3 and take them out of the equation on how you run the website. There is no need for a one size fits all solution here nor a centralized approach.


Except postgres is a terrific backend for a web app, so not sure where you're going with that.

And using Spectrum or Athena and reading directly from the data lake has it's own pros and cons and overhead costs. I wouldn't agree with your approach in most cases. Much better would be to ETL from postgres into a vanilla redshift DW or just set up a replica postgres to query from. If you're using Redshift Spectrum you should probably take a look at Snowflake as an alternative.


This is in comparison to rewriting your web application from MySQL to pgsql so your analysts can work.

And I'm using spectrum as a data lake solution generically. If you want to run snowflake or some other thing is dependent on if you are in aws, gke, bare metal, etc and if you want to host or not. I prefer not to have to manage hosting and I run in AWS.

The important part is you don't need to tie your web app database to all of your needs. Separation of concerns is the important part and independent scaling for independent needs.


Exporting the data analysts want to a jurisdiction that has very bad privacy safeguards is unethical. Even if you're not exporting it between jurisdictions, exporting it to a separate company should not be a step that is easy to take for you.


Care to expand on why MySQL < 8 will be problematic for analysts? Anything else than the lack of window function?


IMO lack of window functions alone are a death knell.

CTEs/WITH clause is the other massive one - almost as important as window functions.

Additionally, EXCEPT/INTERSECT support, better EXPLAIN (it's still shit though), much improved REGEXP and JSON support.


Possibly JSON operators


Window functions.


There's some caveats there.

If you are just running a web app, then this is the correct answer. If your business is in your database, you employ data analysts and want to gleam some additional insights from your database, then it might be worth a look, though most people who chose MySQL did so because they were building a web applications and don't employ analysts.


I think MySQL smells like poor engineering, and PostgreSQL like good engineering. That’s why my gut says PostgreSQL is the better choice, for the myriad issues you won’t face. It’s like Python vs PHP, in some ways.


And like PHP an extraordinary amount of good engineering has poured in after bad, resulting in a quite workable even good products with technical debt. I might also make the comparison to MS-DOS / Windows.

Not sure why this is downvoted.


With sufficient thrust pigs fly, but that doesn't mean you should totally ignore aerodynamics when making a plane.


> ...It’s like Python vs PHP, in some ways.

Uh, did you notice where this was posted? I don't exactly disagree, but it's kinda ironic to raise that point.


How is that ironic? Because HN was written in Arc, denigrating PHP is ironic? I don't get it.

Edit: Oh, because the article is on the PHP subreddit. To be honest I file PHP developers using MySQL under expected, not ironic.


> I think MySQL smells like poor engineering, and PostgreSQL like good engineering

> It’s like Python vs PHP, in some ways.

Not just the ordering of the characterization, which would paint PHP in the positive light you clearly think is undeserving. SMH


It seems far more likely that the GP was imprecise in their analogy's ordering, rather than slamming python and praising PHP.


I'm a huge PG fan but switching your database on an existing project is a non-small undertaking.

If you're starting from scratch, then yes, you take PG every time. There's a lot...a lot that goes into saying that and if you haven't made the jump, a lot of the reasons aren't going to seem important because you currently "don't do" the things that PG lets you do with your data.

Like this: https://www.brightball.com/articles/why-should-you-learn-pos...

So the surface level stuff that you're experiencing with your database is going to be MySQL relative.

Probably the one that connects the most with the people who are working with MySQL is that you no longer have to worry about TEXT, MEDIUMTEXT, LONGTEXT.

Without really thinking about it, most people and ORM's just use TEXT and every now and then you'll get some data to put into the field that doesn't fit...and it won't generate an error it will just truncate the data. You'll end up needing to go and increase the field size wherever the problem happens (and then hope it doesn't happen again).

With PostgreSQL it's just TEXT. Done. Doesn't matter how big or small it is, it will fit the whole document. If the data is larger than 8kb, it will be transparently compressed with TOAST.

I've tested by dropping in a 2mb XML document and it stored as 80kb.

That's just a tiny piece though.


That. PG won't silently truncate your texts. It also won't accept 0000-00-00 dates.

MySQL has enough data corruption bugs to last for a lifetime. It is also maintained by Oracle, what means the bugcount is likely going up instead of down.


FWIW, there are newer defaults and modes that help with old quirks.

Sadly MySQL still lacks TIMESTAMP WITH TIME ZONE.


Text column type is not where I'd advertise pg differences. Rather a much richer SQL query feature set and a smarter (though sometimes smartass) query optimizer.

MySQL is dumb (though predictably dumb) and queries are single threaded, and the feature set is limited. Some queries have no efficient representation, and you need to trade off cost of materialising derived table vs repeated correlated queries.


Agreed. For sake of OP I was going with a known pain point in MySQL.


That, and the horror that is UTF handling on MySQL.


I have on all my personal projects but I wouldn't bother with an existing project unless I had a specific need that PostgreSQL met (and MySQL did not).

I'm actually all in on PostgreSQL at this point -- I'm not bothering with trying to use vanilla SQL and worrying about switching databases but I have that luxury on my side projects. PostgreSQL has a lot of interesting features that I am happy to use and I frankly never really relied on using vanilla SQL in order to be database server agnostic anyway (not saying that doesn't have a place and maybe someone will reply with how it was absolutely essential for their X but...).

To be clear, I still am all for portable SQL queries but I no longer worry about relying on a feature only PostgreSQL offers and/or taking advantage of things it offers that are not part of the standard.


Are you talking about ORM? ORM is such a waste of time to learn as you switch to another language at one point, your time spent on ORM is gone and you don't even learn the underlying tech that is SQL.

Stick with SQL and you can keep that skill mostly the same across different SQL products. And obviously you will sooner or later hit some performance penalty when you start writing complex ORM and ORM starts spitting out SQL without thinking and then your only solution is to complain on GitHub issue as you can't solve it.


This is bad advice. An ORM is a useful tool for smoothing over the generation of queries and their conversion into object graphs, and make sense for many use cases.

It is true that using them effectively still requires knowing the underlying database technology, but that is a given - they are not meant to supplant that knowledge.

I will say though that every time I have encountered a codebase developed by a fanatically anti-ORM developer, it has contained a bug-riddled implementation of a half of an application-specific ORM anyway…


"An ORM is a useful tool for smoothing over the generation of queries and their conversion into object graphs, and make sense for many use cases."

I'm finding it's more powerful to have some language convenient query generator + query -> basic data structure + basic data structure -> final data structure broken into three separate parts, instead of bound together monolithically as most ORMs do it. Those steps are all useful on their own merits, but it's really quite frequent that for some specific task the prepackaged monolithic ORM is not what I need; either I need to tweak the query, or I want to get a basic data structure from some other source (JSON or something) and want to be able to reuse the logic, or I need a tweak to how I'm processing it down to an object (e.g., processing it into a class that is standalone vs. one that is integrated with the rest of the world, the "banana vs. a jungle containing a gorilla holding a banana" sort of thing Joe Armstrong talked about), or I want to create data structures from a complex query involving joins, aggregates, etc. that I can't necessarily express as "a class that represents a table" or something.

It's not the three tools that are the problem; it's the inflexibility of jamming them all together in one shot, along with the semantic contradictions encountered while trying to make one class be the query generator, the basic data structure representing the query, and the "final" data structure, all at once, even before we consider the affordance of the ORM that really, really encourages you to make DB structures to OO classes instead of queries.


Have you looked into sqlalchemy? I think it would meet some of your requirements; I'm not sure about the rest.


It is one of my favorites, yes, but alas, only works when I'm in Python, which for me personally is not very often. YMMV.


I'm working at three projects with three languages and three different ORMs so you have my sympathy. They're the ORMs of Django, Rails, Phoenix. Rails' got the easiest ORM to use by far, maybe because IMHO it's now the closest to SQL. Phoenix's Ecto is needlessly complicated and Django's is as verbose as Python's libraries can get. Example: Model.objects.get(), because nobody could understand Model.get(), right? /s

So, along the years I felt sometimes like I have to relearn how to do SQL in arbitrary library X, for the sake of it.

However ORMs have an advantage: no need to change queries when adding/removing fields to the database. No need to deserialize data from resultsets. The usually migrate the database either by applying migrations to the database and infer models (Rails) or syncing the db with the model (Django). Weirdly Ecto forces the developer to both write the migration and the model.

My ideal ORM would be something that lets me write

  sql("select * from employee join department
    on employee.department_id = department.id
    order by employee.id desc limit 10").each do |record|
   
    puts("#{record.employee.id}, #{record.department.name}")
I don't know how that would play with static typed languages but if all ORMs would be like that we could know only SQL and be able to perform arbitrarily complex queries.


Sounds like you don't want an ORM but a DB driver like psycopg2 http://initd.org/psycopg/docs/cursor.html#fetch


I know psycopg2 and other drivers for several languages. They're ok to make queries but they lack everything else ORMs are good at.

I want an ORM, but a smarter one and as similar as possible across languages. Instead everybody reinvented the wheel in every language, with different approaches, more or less over engineering their solution and making the polyglot experience as difficult as learning English and Russian instead of Italian, French, Portuguese, Spanish (learn one, you'll get the others quickly.)

My background is that I learned SQL almost 10 years before the first Java ORMs went mainstream. I've been using ORMs since the 90s but for writing queries they're a kind of waste: the query in the example in Rails is

  Employee.joins(:departments).
    limit(10).
    order_by("employees.id desc")
at the cost of

1) defining some relationships between the tables in the models

2) learning how to translate SQL in Ruby

ActiveRecord is magically convenient most of the time. Sometimes it's quicker to write that SQL, sometimes is not. Django (I can't remember the name of the ORM) and Ecto are nearly always slower to write than SQL, especially Ecto.

For complex queries SQL is the only way to go in any framework and the ORM is there only to deserialize the results, so why not using it for simple ones too?

  Employee.find(params["id"])
is about the limit.


> Example: Model.objects.get(), because nobody could understand Model.get(), right? /s

That has a simple explanation: each Model can have multiple Managers [1], and the `objects` is the default one. Also, by convention, methods on the Model are usually relative to "one DB record", and methods on the manager are relative to N records.

[1]: https://docs.djangoproject.com/en/2.2/topics/db/managers/


Thank you, I know now why Django inflicts that .objects method to everybody :-)

I'd rather design a less verbose API for the default manager

  Model.get()
  Model.filter()
  ...
and Model.manager.get() if one really needs custom managers.

The fact that I didn't know about managers after years of Django (and nobody told me) should be telling of me and my team, of course, but also how needed managers are. Having to go through that API without shortcuts is not nice to developers.


You're welcome. I'm probably biased because Django has been my framework of choice for more than 10 years, but I really like the distinction between methods relative to the Model and methods that act on multiple records and the `objects` thing doesn't bother me.


I've used the Django ORM professionally for 8 years. Back when I started migrations came in the form of the South app. I can remember one time where the ORM spat out SQL and that was a migration where I botched something in the index of a column. I realized immediately, fixed the error and was done with it. I never once saw anything on Sentry with an SQL error. This is anecdotal and may speak to the genius of Andrew Godwin (and others on the Django team) but it is a data point.


I used to hate ORMs, but then I found Diesel for Rust, which checks that structs match the database schema, which is really nice when building basic CRUD apps. I ditch it when doing more complex work, but for simple tasks, it saves a lot of time and catches some errors at compile time that might not be caught as quickly at runtime.

Nothing beats knowing the underlying database, but an ORM certainly has its place.


We use PostreSQL. We ran into the issue with Google Cloud SQL (PostreSQL-flavored) that we can not make it multi-region, only multi-zone. With Google Cloud SQL (MySQL-flavored) you can do multi-region.

I would feel much better if we had multi-region failover with PostreSQL just in case a Google Data Center region (which contains the multi-zones) goes completely down (as it did in early June 2019.)

We have some manual solutions to this, but I would have prefered one that just worked with Google Cloud SQL out of the box. I think I am not the only one in this situation with this problem.


AWS Aurora also lags Postgresql behind its MySQL offerings.


You also can't do multi-region on Google Cloud SQL (MySQL). The only thing you can do is run your replica on your own VM and configure it to read from the Master in Google Cloud SQL.

(Or I am completely missing something)


> Cloud SQL for PostgreSQL does not yet support replication from an external master or external replicas for Cloud SQL instances.

https://cloud.google.com/sql/docs/postgres/replication/


you can't do multi-region failover with mysql either. btw. multi-region postgres is also not that easy either.

i think the easiest solution would probably to maintain multiple clusters and put something in front of them, like pgpool and write to them simulatiously.

if one write fails rollback all clusters. and hope the read replica of cluster x will take over. probably something like pgpool + postgres cluster on k8s(zalando, crunchy).

at the moment there is no ledger that can span multiple regions. so your on your own for that.

if people really need multi-region they should probably just buy citus data.


I think if you're starting a project the amount of "gotcha" type stuff you'll hit in MySQL/Maria if you aren't already familiar with it is pretty significant compared to PostgreSQL. PostgreSQL is also less fragmented, which seems to be an increasing issue with the MySQL family.

But if you're already pretty used to MySQL and it's working fine in your project already, I would only switch to PostgreSQL if you think specific features would be very worthwhile.

It might be worth comparing features. I have a project that benefits enormously in terms of reduced complexity from the PostgreSQL array types, but that's somewhat rare I think.

Other things that might be useful if you have special/specific needs are PostGIS, table inheritance (not really a huge thing), and foreign data wrappers.

But really you shouldn't switch because you "might" need one of them.


So there’s some nice things about Postgres, but by far the thing I miss the most when I go back to other systems (incl. MySQL/Maria) is transactional DDL. Knowing that a bad migration isn’t going to leave my DB in a half-changed fucked up state is worth it.


This and window functions and safety of data. I'm using MySQL and PostgreSQL in different projects (customers choice) and sometimes there are things I can't do in the MySQL one or would be much more complicated.

Sometimes MySQL silently slips bad data in the db because it truncates strings that don't fit varchars or those impossible 0000-00-00 00:00:00 dates. To be fair, the latter are not much of a problem.

I'd go PostgreSQL all the time but I'd be very wary about migrating a production database. It could be long and risky.


MySQL now has window functions:

https://mysqlserverteam.com/mysql-8-0-2-introducing-window-f...

And MySQL has strict mode, which I believe is enabled by default on newer versions but available in any 5.x version:

https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sql-mo...


Strict mode is like 1/4'th of the safety that PostgreSQL gives you. Robust types for columns. Safe transaction semantics. Better tools for enforcing data consistency.

MySQL has a long way to go before it can give you the same security for you data that PostgreSQL does.


> very wary about migrating a production database. It could be long and risky.

Heh, I've only done a few migrations from MySQL to PostgreSQL, and in every case the hardest part has been dealing with bad data in the source DB. It's always been pretty illuminating for the client when they discover how much bad data they actually have in the DB, even though their "ORM Validations" were keeping it "clean" to make up for MySQL's shortfalls.


DDL?


DDL is the sub-language of SQL which alters schema. Eg create table, alter table, drop table are DDL commands.


Data definition language - the SQL for defining and updating tables and so on.

You want it transactional to enable rollbacks.


I recently migrated a small side project from MySQL to PostgreSQL. I wrote a short blog post about how I did the migration, issues I ran into etc: http://blog.minimum.se/2019/07/09/upgrading-node-mysql-to-po...

I'm overall very happy with the migration and PostgreSQL also ended up taking less memory (RSS) in prod, which was a nice bonus (this is useful for me because I'm running this on a low-end Digital Ocean droplet).


The best part of moving to PostgreSQL is that it is sort of the defacto standard wire protocol for relational databases now and that enables you to "grow into" something like CockroachDB down the line when you need that sort of scale.


Can you explain your comment in more detail as I am not really sure what you getting at with "the defacto standard wire protocol for relational databases" nor "grow into" something like CockroachDB".


In the case of CockroachDB (a distributed database), it has the same wire-level protocol as Postgresql. So if your programming language has a Postgresql driver (and it likely does), then you can also talk to a CDB database as well.

The SQL dialects are not the same, though they are similar, so you will have to go through your code to see if you need to fix anything. Moving the data itself over is fairly straight-forward:

https://www.cockroachlabs.com/docs/stable/migrate-from-postg...


Ok, so the "driver protocol" would be another way of saying that.


Not parent. No idea about the defacto standard comment. CockroachDB client protocol is* compatible with PostgreSQL. Edit: as well as SQL syntax, that's the important bit.

Meaning you could in theory mirror data to CockroachDB and the point your clients to it instead of PostgreSQL. CockroachDB is aiming to solve horizontal scaling, hence the "grow" comment.


PostgreSQL underlies at least a plurality of SQL DBMS developments nowadays, specially for scale-out (distributed) projects.


Any sources on that assertion? A quick bit of Google searching [1] shows me only ~10% market share for PostgreSQL, compared to ~46% for MySQL.

[1] https://www.datanyze.com/market-share/databases/postgresql-v...


That's a joke right? I mean, you don't measure market share by looking at how many "websites" use one or the other database.


because wordpress


Yes one time, I joined a startup that was 3 months old that had used MySQL, a lot of the queries had manual joins that could be solved with CTE/recursion. We also had good use for function indexes, partial indexes, materialized views.

Is it worth it? In our case, BIG YES. But most of the times I don't think it would be. Migrating data isn't that big of a job, but rewriting the queries are. If you use an ORM for that majority of the queries the switch may be simple.


I used MySQL till about 2012. Inherited an intricate Postgres database at a job and found lots of ways to tweak it that were not as easy in MySQL. I would only touch MySQL today if I'm not working on a greenfield.


This strikes me as bike shedding. If you have a specific problem that you believe will be solved by the migration, do it. Otherwise, users will benefit more if you focus on changes based on their value, not what part of your tech stack looks better on Reddit.


If you have a specific problem that you believe will be solved by the migration, seriously investigate the possibility of migrating. This is a big change, and it's likely to both solve and create problems; take it slow and steady, and keep your abort options open as long as possible.


This is a fair response.


It is bike-shedding in a way, but I think it’s also a direction PHP is taking (becoming more strict, more rigid, less “hacky”).

Switching to Postgres follows the same philosophy I think. Migrating existing services is usually a stupid thing to do, but anecdotally I know a number of PHP shops adding pg to their infra and building the new services there.

I’d compare it to the switch to unicode everywhere or normalizing dates on UTC.


I think a lot of people have a mental model of MySQL that's stuck in the 4.x days. You can get most of these proposed benefits by "just" upgrading to the latest MySQL and making sure that you're running it in strict mode.


Migrating from Oracle to Postgres is much more appealing because you can avoid the high Oracle license costs.

I don't really see that much benefit in converting from MySQL to Postgres.

While Postgres is a legit database, so is MySQL. If you're looking to scale MySQL, I recommend the book High Performance MySQL.

For greenfield projects I'd be comfortable with either MySQL or Postgres.


Here's some discussion from when Uber switched back from PostgreSQL to MySQL:

https://news.ycombinator.com/item?id=12166585

FWIW I prefer PostgreSQL.


The linked article is a good dive into the internals of Postgres vs MySQL.

However, Uber didn't really switch from Postgres to MySQL, they switched from Postgres to their home rolled document database which uses MySQL as the storage layer. [1]

[1] https://eng.uber.com/schemaless-part-one/


Markus Winand's write-up at https://use-the-index-luke.com/blog/2016-07-29/on-ubers-choi... really drives this point home. InnoDB has favorable characteristics for what they're trying to do.


I finished our very niche ORM for PostgreSQL from 15 years of MySQL only and must say that I have been able to do much more ‘peephole optimising’ (not sure if it is applicable but I mean very specific, very postgres specific optimisations vs what I could do in mysql) and especially for bigger datasets with more ad hoc querying PostgreSQL has helped us. Also, I like the codebase better; I did lot of work (proprietary) on mysql internals and I found it harder to work with. Maybe that changed but I guess not.


As much as I like PostgreSQL's features, MySQL does have advantages in some situations.

The copy on write/mvcc type stuff in particular has bit me hard in the past. It's great in most cases but can cause some serious issues with huge tables where every row is updated many times. Even with frequent table cleanup/analyze/etc, the tables bloat up and can become completely unworkable whereas in MySQL there are no problems at all. Symptoms are things like the table sizes growing hundreds of GB larger than the data, queries which should use indexes becoming table scans, etc. The existence of multiple versions of a row seems to really upset the query planner in some cases. I have a table which takes several hours to do a basic conditional count in PostgreSQL, which will finish in milliseconds on MySQL.


MySQL's InnoDB is MVCC as well, at least when used with transaction isolation.

That said, MySQL does have more mature replication options built-in.


I wasn't aware of that, but given the transaction capabilities, it makes sense. I primarily use InnoDB and have never had any similar performance issues, so I havn't looked into it heavily...

MySQL InnoDB tables handle the use case much much better, so just guessing but it's probably a difference in how the indexes handle updates/multiple versions and how the query planner works for dirty indexes.


If you don't need transactions, then MySQL with the MyISAM engine has several aspects I love:

1: Blazingly fast. As it simply hands DB writes to the OS and benefits from all the caching and optimization the OS does. Depending on your storage medium, insert/updates/deletes often are 100x faster then with an ACID compliant engine.

2: Simple data structure. Want to see the sizes of your tables? "ls /var/lib/mysql/your_database" and you see each table as two files. One for the data, one for the indexes.

3: Easy to copy the data "cp -r /var/lib/mysql/your_database somewhere" and you are done.


As long as your application understands https://dev.mysql.com/doc/refman/8.0/en/internal-locking.htm... then MyISAM may work for you, it just should be relegated to the second tier option especially for new development. I feel like I have seen so many people overestimate the benefits of MyISAM when they later realize that they can't cleanly write all of the information in a single append unit and have to touch values after the first insert.

I generally hope people stop using it as a starting point and instead relegate it to a particular optimization crutch like in-memory tables.


I've done it twice; the context is completely different on why though.

Job #1, we were a heavy PHP/MySQL shop with a smattering of Ruby. I had come from a PG/MySQL shop and knew the merits and pitfalls of both databases (PGs auto vacuum being a particular point of contention at my job before this one), and informed the CIO that going to PG instead of following the upgrade path to MySQL $latest had a number of minor improvements and since our SQL was /mostly/ standard on the frontend we could attempt the change.

I was in charge of testing and automating deployments of the test environments, so I set about massaging the data into PostgreSQL-types. However, we found that our data had been being silently corrupted over a long period of time. Dates were often very incorrect, values of not null columns were filled with the "empty" of whatever it was (0 for int, "" for string etc).

We managed to get the data into postgresql and a lot of our tools stopped working, the commerce site itself was working thanks to the efforts of two really committed developers, but the back office tooling was not ported. Due to the sheer amount of queries we had which were running silent failure modes (which postgresql does not allow to run) we ended up going back to mysql rather than fixing them.

I don't work there anymore.

Job #2, We were a heavy C++/MySQL shop, MySQL just because it's what everyone knows, and because everyone knows it, it becomes a standard that is hard to move away from. I was tasked with getting 50KiB~ binary data blobs to be saved with a high update frequency, strong consistency and ideally alongside its locks.

We tried to coerce mysql to do the right thing for a very long time, we rented this huge iron server which had 42 directly attached SSD's in RAID0 (to test the limits of the software as a control, not for prod-like environments), 60 CPU threads, one of the best raid cards you can buy etc;

What we found was that MySQL couldn't climb above 600qps for this workload, no matter what you did, we even ended up recompiling the kernel with a faster implementation of sockets and ipc, and changing the memory allocator mysql used (which was unsafe), but we couldn't get it to move much faster than that.

So, on profiling we found that it was spinning on an internal lock, something we couldn't engineer around.

600qps was well below our target, so we started looking at alternatives. It should be noted that one of the requirements was that the data be fsync'd before returning OK to the application. I say that because 99% of noSQL databases only ensure that their data is in VFS.

We tried postgresql as our only non-noSQL test, since it's similar enough to mysql's syntax and easy to test. The C++ developers fell in love, instantly with the C++ SDK for pgsql, instantly referring to the mysql equivalent as "braindead", and the database scaled linearly with I/O, even as we added a additional arrays of drives eventually saturating our 40GBit NIC.

Now we run postgresql as our only RDBMs.

I still work there.


Both of these cases are the exact same reasons why I end up finding MySQL to be concerning especially going forward in time. There are just some use cases that in the current MySQL design with regards to locking that I feel like will never be fixed now that MySQL feels like it's headed further and further towards splintering as an ecosystem. These are the kind of extremely hard, change the major version number level issues, and with the MariaDB/MySQL/cloud MySQL-compatible frontend ecosystem I'm not entirely sure these things will ever get fixed.


Everyone's needs are different but everything from partial and expression indexes to JSON and JSONB fields. I switched over to Postgres 100% some years ago and never went back. I find it to be much nicer to work with as a founder of a tech company who had to wear both the developer and dev-ops hat for some time. I'm not familiar with the Google cloud flavor, but we're building out with Postgres in development and staging and using Amazon's Aurora (Postgres compatible) in production.


A very trivial thing (having used both PostgreSQL and MySQL in production for many years) , I find Mysql's EXPLAIN query much easier to grok than the postgresql's query plan.


I probably wouldn't migrate a project, but I switched my default db to PostgreSQL after seeing the many ways MySQL can silently screw up data. Truncating strings, truncating numbers, allowing NULLs in a NOT NULL, etc. Much (all?) of this can be fixed with settings these days: https://dev.mysql.com/doc/refman/5.7/en/constraint-invalid-d...

PostgreSQL has a hard bias toward correctness. If you give it invalid data, it will blow up loudly every time.

PostgreSQL also has a boatload of useful features. Need full-text search? It has it, and it's good enough for lots of stuff: http://rachbelaid.com/postgres-full-text-search-is-good-enou...

Need to search for locations with coordinates < N kilometers from a coastline? PostGIS does it.

Need to store and query JSON? Partition tables by a date range? Index only the rows matching a WHERE condition? Use a CHECK constraint? Prevent race conditions from creating reservation rows with overlapping datetime ranges (exclusion constraints)? Run a query periodically and cache the results as a queryable table (materialized views)? The list goes on and on.

More correct data + fewer reasons to add another tool to the stack - especially when that would mean having to keep data in sync - is pretty compelling to me.


I'm made the jump, it's not better exactly, it just has a different set of problems.

For example there is no unsigned or tinyint, advisory locks in PG are much worse than those in MySQL, true and false "t", "f" instead of "1", "0", and some other things.

Yes, each item can be dealt with - but it's like I said, it's just a different set of problems.


Not being at the mercy of Oracle's whims is enough justification to consider migrating to MariaDB at the very least. MySQL is (miraculously) still alive for now, but it wouldn't be the first case of Oracle letting one of its acquisitions wither and die a slow, painful death.


Recently did this as MySQL was just not handling large tables as well we needed and their new partitioning and clustering were lackluster for our purposes.

Without even implementing the partitioning and clustering - simply moving to Postgres has "reset" our performance problems on our system. We have tables with 10M+ rows in them that dragged on MySQL (Aurora RDS) - with Postgres (Aurora Postgres) things are running smoothly.

If you are dealing with huge amounts of data but not quite into "BigData" territory - I say go for it.

I just hope you have used an ORM or your queries are generalist enough to make the switch easy - even with an ORM it was a painful migration for us!


One of the biggest reasons I’m still using MySQL is because of Sequel Pro. Whoever running PostgreSQL would do themselves a service to build a GUI of similar quality and UX polish.

https://www.sequelpro.com/


As a DBA who works with both MySQL and Postgres, I strongly prefer managing recent versions of MySQL for large SaaS systems.

Postgres has significant issues with vacuum and index write amplification, as noted in the Uber blog post a few years ago - nothing has changed.

Additionally, there is no clear compliance story regarding schema configuration, grants and restricting COPY with Postgres, while those are no-brainers with MySQL.

If you're an AWS user, then MySQL has more advanced managed options as of 2019.

In summary, Postgres is fine if you don't have paying users yet - any database would do fine.

But there's a reason Google and Facebook have stayed with MySQL, and that's because it is manageable at scale whereas Postgres simply isn't.

On the other hand, if you want to use Vertica (derived from the Postgres code base) as a column-store database, then managing it is the same as Postgres.


There were some comments regarding Postgres and MySQL in a Postgres release post before.

https://news.ycombinator.com/item?id=19872666


Isn't that about just switching the db driver in the ORM?

Sorry for being sarcastic, but was that not exactly the point of countless database abstraction layer projects?


Yes and no. The point of an actual ORM (object relational mapper) is to be able to the equivalent of lisp/smalltalk (runtime) image level perstiance: make objects permanent between restarts and changes transactional.

To really get there, you actually need a object db, like zoodb[z] for python (which incidentally can use pg as a storage back-end).

If you're using a db abstraction layer, and a typical active record like surrogate keys - then for simple use cases you can indeed switch back-ends; many rails projects run tests on sqlite and production on postgresql.

That's not a great idea, but it works for simple projects (first gotcha; most rdbms don't have a global namspace for indexes; sqlite does).

But there's no standard AFAIK for full text indexes, geodata, json/bson, materialized views... So it depends a lot on how much of the db you're actually using.

[z] http://www.zodb.org/en/latest/tutorial.html

Note that an object db does not free you from thinking about data structure migrations:

http://www.zodb.org/en/latest/guide/writing-persistent-objec...

https://pypi.org/project/zope.generations/


Uh, no? The idea is to have a full featured object mapping layer that can work with multiple database backends.

Doing mapping of db result sets to object is non trivial, no need to do develop it multiple times for each RDMBS when 90% of the code would overlap.

I'd work on the sarcastic attitude though


Yes and no. You still need to ETL the data from one db to the other and (quite probably) the SQL dialects will be different so doing a dump'n'load isn't going to work.


For a real life metaphor, changing address in order to receive mail is easy; moving to a new house is the complicated and expensive part of the change.


And they only work that way if you switch the DB driver frequently during the development of your product.

The abstraction is useful. The abstraction is leaky.


Moved my prior company from MySQL to Postgres in 2004 and never looked back. The stability across time and through upgrades was remarkable.


Meta:

Thank you for linking to old.reddit.com. Personally I really hate the new interface at www.reddit.com.


If you need specific PostgreSQL features then yes, otherwise switch to MariaDB instead.


I was a bit bummed when I started my new gig at a MySQL shop after spending a few years enjoying all the new recent features of PostgreSQL. But then we switched to MariaDB and it basically has everything.

Definitely give it a look before considering switching.


I like them both depending on the use case


Why not Percona?


Yes




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: