The guy has deliberately changed a default setting (sql_mode) to make MariaDB/MySQL look worse. When someone pointed that out in the comments, he left it unresponded. Which is already telling something about PostgreSQL evangelism.
Speaking of comparing MySQL and PostgreSQL functionality, I've been spending some time recently on this very topic. Both are great databases with interesting and sometimes unique features. Here are some of the features PostgreSQL is missing as compared to MySQL:
- synchronous multi-master cluster (Galera, Group Replication)
- distributed in-memory grid with automatic sharding (NDB)
- semi-synchronous replication (will be available in PostgreSQL 10)
- built-in logical replication (will be available in PostgreSQL 10)
- built-in event scheduler
- clustered indexes
- declarative partitioning (limited functionality will be available in PostgreSQL 10)
- optimizer hints
- optimizer tracing
- efficient MVCC implementation that is unaffected by XID wraparound problems and VACUUM issues (see also Uber's report on moving from PostgreSQL to MySQL)
- page-level compression
- page-level encryption
- page-level incremental backups
- user variables
- NoSQL client APIs (HandlerSocket, memcached protocol, X protocol)
- virtual columns
- ability to choose data page size on database creation
- ability to specify column order when adding a new column with ALTER TABLE
- write-optimized storage engines similar to MyRocks and TokuDB
All of that may seem small from a developer's standpoint. Those things, however, are huge for people who design and operate the busiest websites on the internet. Which is probably why MySQL still shines in that field.
Advanced replication strategies in MySQL and Oracle mostly cause hair loss and divorce. Everywhere I have worked that uses these databases eventually convinces itself that it needs futuristic replication technology and then goes on to suffer from outages and weird corruption. It turns out, distributed consistency is a hard problem. Oracle and MySQL both benefit monetarily from people assuming it will work and then activating it, only to find out a month or two later that it's a complicated problem that requires actual planning and oversight.
I think Postgres will be a better product when it incorporates similar functionality, but the implementations in MySQL and Oracle lie somewhere on the border between false advertising and reckless endangerment. I appreciate that Postgres not having it as two-lines of configuration sends a message that you are making a commitment. Nearly everything else in Postgres is set-it-and-forget-it. You really want to have a DBA if you're going to try to make replication work. When you read the docs on having a hot standby and how much of a commitment that is and sense yourself getting tired, you're starting to appreciate what you might be getting yourself into with single-master replication, let alone multi-master.
This has not been my experience of multi-master asynchronous replication in MySQL at work. We use only one replica for actual work and the second master is read-only until failover - and is only write enabled after the primary is disabled - and it works quite well.
Getting replication bootstrapped synergises well with having a smooth backup restore procedure, similarly any hiccups in replication can be dealt with using restore.
We have other release QA procedures that use restored backups for testing, so our restore procedure is extremely well exercised. Really fast restores using the Percona tools is a considerable bonus of the MySQL world.
> This has not been my experience of multi-master asynchronous replication in MySQL at work. We use only one replica for actual work and the second master is read-only until failover
That's not multi-master. That's single master with a hot standby.
> any hiccups in replication can be dealt with using restore.
Restore isn't an option when your database is so large that it takes most of a day to copy the backup onto the replacement master to restore from it. (Not speaking hypothetically here.) Obviously restoring from a backup is still there as a last resort in this case, but you really don't want to have to go there.
Galera builds on that, and it works very well. It has weak points, but if you build a working Galera cluster and test it (test the failovers, test [re]bootstrapping a node), you can then be confident that it'll continue to work as intended for a long time.
There are rough edges. Upgrades, recovery from corrupted hot backup, etc. But it's very easy to rebuild the cluster in flight, for example when changing some fundamental cluster property, or after a PEBKAC, (with a not too big read-only window), though it requires plus one node, a regular MySQL/MariaDB server that acts as a donor (where Galera can bootstrap from), but that's not too surprising if you want to at least serve read-only requests.
If you really want no-ops deployment, use something that requires minimal amount of ops, like ElasticSearch. Just give it an obscene amount of RAM and it'll work fine for ages.
RDS or not, if you want to run a big site, you'll need (dev)ops expertise. And a lot of it. Famously, Netflix - the big full-cloud business, - employs the most aggressive performance engineering and other "auxiliary" development groups. (Because AWS is fckn!1 expensive, and when you can run on a thousand less nodes, it'll cost you less. And yes, of course Netflix gets a big discount, but you won't, at least not initially.)
Neither Netflix nor any of the top "startups" are anywhere near aggressive performance. In fact they're usually full of slow fragile tech that only looks great to blog about.
HFT, adtech, scientific engineering and many other industries are magnitudes farther ahead.
This is the worst joke in tech really. Developers/Managers all jumping on the "cloud" bandwagon because "hey we don't need ops" and then you end up with "Galera is such a POS, you can't use it with RDS".. <facepalm>
> If you really want no-ops deployment, use something that requires minimal amount of ops, like ElasticSearch. Just give it an obscene amount of RAM and it'll work fine for ages.
I can assure you this is not the case if your indexes are very large and their number is constantly growing, and/or clients are doing very complex queries that stress the system.
The same thing applies to anything really, if you try to "shard" (partition, or simply do multi tenancy) based on files, MySQL tables, MongoDB collections, and of course elasticsearch indexes.
And you have to plan ahead and start with a large number of shards, to be able to scale your ES cluster.
Yet it's much better than trying to get some kind of SQL based cluster going, because those require knowledge of a lot of subtle internal details.
Just because MySQL calls it multi-master doesn't make it so. What you described is single master with a hot spare. Real multi master is when a client can write to both servers and they will be consistent by the magic fairy dust that is replication.
How exactly do you get that multi-TB backup onto the database server's disk so quickly? Even with a gigabit network interface, that's only 125MB/s, which is over two hours[0] to transfer 1TB. Even if you have a 10 gigabit network, that's ~13 minutes[0].
Also consider if the backups are compressed -- a 1TB compressed file also takes time to uncompress. If the data compresses well, you could even run into the disk being the bottleneck, not the network.
[0] Well, more than that; those numbers assume you're actually getting the link rate with no overhead, which of course isn't reality.
RE: Oracle Replication - Have you looked at DbVisit? Compared to the PhD you need for GoldenGate, you can get by with a high school diploma with this stuff.
To restate your other reply, key-value allows you to sweep consistency problems under the rug. What happens if two transactions try to update the same row? One of the transactions fails. What happens if two clients write different values to the same key in a multi-master key-value store? The last one in wins and the first write is just lost.
Many of the NoSQL databases have provisions for these situations so they can handle both writes but bubble the decision up to the application layer. CouchDB, for instance, will tell you you've got multiple versions and has a way for you to tell it how to resolve the conflict. CRDTs let you bake the logic in for handling a conflict. MySQL does not do any of that. If your database consists of "likes" and other low-value data like that, you may not care if you're losing data at some low rate. Reddit is in that situation, and the decision seems to work for them. But I don't think it generalizes to all situations.
The Uber blog post got a lot of things wrong and if that is representative of their knowledge of PostgreSQL I would not suggest using them as an authority on PostgreSQL or databases in general.
> The guy has deliberately changed a default setting (strict_mode) to make MariaDB/MySQL look worse. When someone pointed that out in the comments, he left it unresponded.
I don't think the comments read like that. Someone pointed out that some MySQL variants have changed the default in a recent version and the author criticised the defaults, so the implication is that he wasn't using a bleeding-edge MySQL installation and got the previous defaults.
Agree, I might have misread the discussion. However, strict mode is on my default since MySQL 5.7 (released in 2015) and MariaDB 10.2 (which is also the version he was using). So he changed that, but didn't explicitly state it. Even when multiple people mentioned sql_mode.
> However, strict mode is on my default since MySQL 5.7 (released in 2015) and MariaDB 10.2 (which is also the version he was using). So he changed that, but didn't explicitly state it.
From the comments of the original article, the actual version he was using is 10.0:
> this is the default version coming with Fedora core 23 (mariadb.x86_64, 1:10.0.28-1.fc23).
OK, I see it now, thanks. I would even edit my original comment if I could. Still, the original article is biased without mentioning strict mode, the fact that it's been recommended to turn it on in any serious deployment for years, and that it is enabled by default in the latest MySQL/MariaDB releases.
There are reasons to choose PostgreSQL over MySQL (and vice versa), but the article is certainly an example of unfair evangelism.
It's a pretty damning point against MySQL that there's a default setting that you have to know to change, or else the DB will happily throw away your data.
A lot of the evangelism against MySQL is rooted in a deep skepticism against letting a DB that would even have such a feature, much less leave it on by default for years after sort-of admitting that it's probably a bad idea, anywhere near your data. I loved MySQL back in the day when it would work straight out of the box on both Windows and Linux (and god knows not a lot of web software did back then), and Postgres was always, frankly is still, a bit opaque to actually get from zero to one. But I also remember discovering the varchar overflow and the users being pissed at me for losing their data. Luckily it wasn't a really important field and we discovered it reasonably fast, so fallout was contained. But the same app actually dealt with payments, it could have been much worse.
It's basically a country song: MySQL is the kind of guy that's tons of fun, always the centre of the party, but used to disappear for multi-day benders without a word, but have now cleaned up and is disappointed that his high school sweetheart prefers the boring guy who went to college, got a job and was never late for a date.
True, but then you can sing a similar song about PostgreSQL being the kind of guy that is cool all around, and is always there for that high school sweetheart, but still that badass MySQL gets all the fun and is always center of the party, which is oh so wrong.
I know what I'm talking about. Over the last couple of years I had to speak to many people in the PostgreSQL community. While there are indeed many smart and nice people with balanced views and opinions there, the majority has this ideological bias. With ideology being that PostgreSQL is superior to MySQL in every single way. All counter-examples and arguments are usually ignored. I think it's something unique to PostgreSQL community, I've never met anything like that in other technical communities.
As someone who might fairly get scolded for this, we feel like praising MySQL's features is a bit like enjoying the recessed lighting and granite counters in a house whose roof is actively leaking.
There are certainly features MySQL has that Postgres doesn't, but data integrity is problem 1, and most of us got burned by MySQL on that one at some point before we got here. Or, we deployed MySQL and then found out that the features we were depending on were InnoDB-only, but switching to InnoDB cratered our performance. Or we tripped on one too many of MySQL's numerous special cases or silent limitations.
You can use MySQL productively. You can even benefit from its special features. But the thing taken together is a bit of a mine field.
Postgres has some false advertising issues too ("object-relational"?) but the day-to-day living with it is a lot nicer, because the roof doesn't leak. I can live without the granite counters.
> Or, we deployed MySQL and then found out that the features we were depending on were InnoDB-only, but switching to InnoDB cratered our performance.
Or worse, we were expecting InnoDB but the default on the mysqld which comes with the RHEL version the client used to deploy our software was MyISAM, so the cascade didn't work when deleting a row, and the dangling reference caused an exception deep in the framework we were using, making parts of our software stop working.
Describing the production operation of MySQL as a minefield is an excellent analogy. I maintained a list of the mines I stepped on in the last job I had where MySQL was used and here are the two I encountered with the biggest blast radius:
* If you have pre-5.6.4 date/time columns in your table and perform any kind of ALTER TABLE statement on that table (even one that doesn't involve the date/time columns), it will TAKE A TABLE LOCK AND REWRITE YOUR ENTIRE TABLE to upgrade to the new date/time types. In other words, your carefully-crafted online DDL statement will become fully offline and blocking for the entirety of the operation. To add insult to injury, the full table upgrade was UNAVOIDABLE until 5.6.24 when an option (still defaulted to off!) was added to decline the automatic upgrade of date/time columns. If you couldn't upgrade to 5.6.24, you had two choices with any table with pre-5.6.4 types: make no DDL changes of any kind to it or accept downtime while the full table rewrite was performed. To be as fair as possible, this is documented in the MySQL docs, but it is mind-blowing to me that any database team would release this behavior into production. In other words, in what world is the upgrade of date/time types to add a bit more fractional precision so important that all online DDL operations on that table will be silently, automatically, and unavoidably converted to offline operations in order to perform the upgrade? To me, this is indicative of the same mindset that released MySQL for so many years with the unsafe and silent downgrading of data as the default mode of operation.
* Dropping a table takes a global lock that prevents the execution of ANY QUERY until the underlying files for the table are removed from the filesystem. Under many circumstances, this would go unnoticed, but I experienced a 7-minute production outage when I dropped a 700GB table that was no longer used. Apparently, this delay is due to the time it takes the underlying Linux filesystems to delete the large table file. This was an RDS instance, so I had no visibility into the filesystem used and it was probably exacerbated by the EBS backing for the RDS instance, but still, what database takes out a GLOBAL LOCK TO DROP A TABLE? After the incident, I googled for and found this description of the problem (https://www.percona.com/blog/2009/06/16/slow-drop-table/) which isn't well-documented. It's almost as if you have to anticipate every possible way in which MySQL could screw you and then google for it if you want to avoid production downtime.
There are others, too, but to this day, those two still raise my blood pressure when I think about them. In addition to MySQL, I've pushed SQL Server and PostgreSQL pretty hard in production environments and never encountered gotchas like that. Unlike MySQL, those teams appear to understand the priorities of people who run production databases and they make it very clear when there are big and potentially disrupting changes and they don't make those changes obligatory, automatic, and silent as MySQL did.
Data integrity is of course important, and MySQL indeed has a weak story here, especially in older versions.
Is PostgreSQL flawless when it comes to data integrity? No, that would also be false advertising. Some examples:
- data checksums are disabled by default (silent data corruptions!), and the user manual specifically warns they are not cheap. And the reasons they are expensive is that PostgreSQL does not support O_DIRECT IO;
- there are no tools to verify replication integrity, similar to pt-table-checksum for MySQL; that would require statement-based replication, which is not supported in PostgreSQL, even in the next major release
Why would O_DIRECT make checksums any cheaper? You always update the checksum at the same time as the data so I cannot see how O_DIRECT could matter.
I am pretty sure the real reasons for checksums being off by default are two:
1) That you need to WAL log hint bit changes if for checksums to work. Hint bits are part of PostgreSQL's MVCC implementation and that you do not need to WAL log them is a clever optimization. This extra WAL traffic can at least in theory hurt some workloads.
2) Probably most importantly: nobody has provided the benchmarks necessary to show that the cost is not too high.
Checksums on modified pages should only be computed immediately prior to transferring pages to disk. DIRECT_IO makes that transfer point explicit so you can compute the checksum exactly once per transfer regardless of the number of page modifications. If you use the kernel cache like PostgreSQL, you must apply the checksum pessimistically because you don't know when the page will _actually_ be written back to disk by the kernel in-between page modifications.
Computing page checksums burns quite a bit of memory bandwidth, and memory bandwidth is a major resource bottleneck in modern databases. In the DIRECT_IO case, the checksum memory bandwidth should be the same as your storage bandwidth; pessimistic checksumming without DIRECT_IO can burn memory bandwidth that significantly exceeds the storage bandwidth.
Particularly in cases where you have modern, high-performance storage devices, like PCIe connected flash arrays, you can't afford to burn memory bandwidth on checksumming beyond the minimum technically required.
As I explained in another reply, PostgreSQL does no write directly to the files/file cache. It has its own buffer cache and only calculates the checksum when flushing dirty pages to disk (where PostgreSQL does a normal, cached write syscall and then fsyncs). The lack of O_DIRECT causes other performance issues, but not this one.
Without O_DIRECT, the page may be updated multiple times in the page cache before it hits the disk. The only thing PostgreSQL can do is to update the checksum before expelling the page from its own cache, so it will be doing unnecessary work in write-intensive workloads. But yes, hint bits and WAL is also a good point.
But PostgreSQL does not update the file directly so this is not an issue. Instead PostgreSQL does all updates in its buffer cache, and then has a background writer which flushes modified shared buffers to the file (but without using O_DIRECT), and then finally it fsyncs the modified file. The window between modifying the file and fsyncing is usually not big so almost always when a page is updated multiple times that will happen before a page is written from the buffers to the OS's file cache.
The above is obviously inefficient which slows down writes, but checksums are not affected by this since those are only calculated on flush. The costs of not using O_DIRECT are 1) extra work on flushing to disk, 2) IO spikes due to the unpredictable nature of when and in what order the OS decides to flush to disk, and 3) wasted RAM. The main benefit is that being able to use the file cache on read loads (by tuning PostgreSQL with small buffers) is that PostgreSQL is nicer to run on shared machines, for example my laptop, since the OS can steal back the read cache when PostgreSQL stops using it.
The object-relational stuff works (table inheritance). Or at least, it did the last time I tried it out a few years back; I've never used it in production.
Your insistence on replying to every single comment on this thread combined with your statements like "I know what I'm talking about" (and you don't) suggests to me that perhaps you're an evangelist yourself?
> - synchronous multi-master cluster (Galera, Group Replication)
Which doesn't work, see [1]. What other features of those you listed the authors claim that work, and just don't when you look deeper?
Lot of the big companies are using MySQL as a NoSQL database, and that's not a bad way to use MySQL, but it's a pretty bad relational database, see [2].
If you've read the aphyr link with this (and the rest of his conclusion):
> Unfortunately, even in totally healthy clusters, with no node failures or network failures, Galera Cluster does not satisfy its claims of Snapshot Isolation
...how on earth can you straight-facedly claim that it's a plus point for MySQL over PostgreSQL?
I didn't realise a few of these existed, so thanks.
> - page-level encryption
This got me excited, but for anyone else looking: to keep the key securely (not in a file on the DB server, where an attacker can get it + the DB files) you need to but MySQL Enterprise Edition. Quoting from https://dev.mysql.com/doc/refman/5.7/en/innodb-tablespace-en...:
> The InnoDB tablespace encryption feature in non-enterprise editions of MySQL uses the keyring_file plugin for encryption key management, which is not intended as a regulatory compliance solution. Security standards such as PCI, FIPS, and others require use of key management systems to secure, manage, and protect encryption keys in key vaults or hardware security modules (HSMs).
Well, personally what I see is people comparing a group of developers that are adamant on never corrupting any data, where everything is secondary and will refuse to push any feature that may cause problems with a group that is adamant on increasing market share, if it's by doing a low quality MVP so be it, that I have seen corrupting data on production several times, and whose software still carries the scars from those MVP's errors.
Yes, MySQL is all featureful¹. The only thing missing there is guarantees that your data will stay good.
1 - Some of those are "no, we won't ever implement it" features that will do nothing but eat your data by design. Why would you ever want them in a persistence layer? Cache on memory somewhere better.
> Some of those are "no, we won't ever implement it" features that will do nothing but eat your data by design. Why would you ever want them in a persistence layer? Cache on memory somewhere better.
I feel a significant motivation for MySql development are lightweight webapps on PHP like Wordpress where the developers are beholden to a third-party hosting environment and have a significant front-end focus...
From that perspective I can see why bells and whistles that would be moronic in a normal application would start sticking to the database itself.
MongoDB suffers from some of the same market pressure: being easy to demo and get from 0 to SAP in 10 min is great, but having to choose crazy insecure defaults and be open for data loss in order to get there is a showstopper in Enterprise circles. In both cases the ironic demands for competent administration are much higher because of all the potential for foot-shooting.
As said in many other comments, the idea that there is even a "strict mode" which was not on by default for many years is itself a problem and a red flag for other problems.
MySQL has similar issues to postgres vacuum. If 'checkpoint age' gets too high throughput drops by ~5x to prevent WAL wrapping. We hit this in production when using 5.5. They've made a lot of improvements to flushing in recent versions, but it isn't a strict advantage. With both DBs as long as you monitor you would be fine.
As a general recommendation maybe this is ok. But, unrealistic relative to the world we live in. Sometimes you need to use hints. Oracle / PostgreSQL / what have you, are not perfect nor are developers writing the SQL.
Is your customer going to wait for the next version of Oracle? Is your dev team going to fix some sev 3 performance bug?
Hints allow you to use your brain to help the optimizer. Why would you make a hard rule not to use them? Why else are they there?
I won't speak for Oracle because it's always been a "if you have the money you can get a custom patch for the behavior you want" game.
But for Postgres, you contact the developers. If your problems are costing you so much money why wouldn't you just hire a core developer on the Postgres team or hire EnterpriseDB on contract to work on your issues? They'll fix the bug, upstream the fix, and roll you updated packages you can push out to your servers. Not entirely different than working with Oracle, except you are working in the open source world.
Hints are bad. Next time you upgrade your database you may end up with worse performance due to other changes in the query planner. Once your data crosses a certain size threshold it may also start performing worse. They are unpredictable. You will never know when they could break. Using them is the wrong solution.
I agree with paragraph three. A hint is a directive. Now we are stuck with it. Bad stuff can ensue...maybe. But, maybe my customer is pissed now. Total cost to fix: a couple hours.
If I had the budget and developers to make PostgreSQL as performant as Oracle, I would be living on an island next to Larry.
It does say that one feature you’re listing is not really sensible to use without the Enterprise Edition, but you’re right, it does not say that. Original comment redacted.
It is, in fact sensible without Enterprise Edition. The only thing it says is that page-level encryption in Community Edition is not intended as a regulatory compliance solution. Nothing more, nothing less.
Even in case you care about regulatory compliance, there are other options besides Enterprise Edition.
What are the other options, and when would page-level encryption in Community Edition be useful?
If I'm understanding it correctly, page-level encryption is to protect against someone gaining access to the DB server and copying the files off it. If the key is stored in a file on this DB server, I might as well not bother encrypting the pages.
Not sure of your other points, i have no need for them.
What i really don't like on MySQL (and i don't care for a lot of other stuff): MySQL 5.6 (yeah don't forget your version numbers) really sucks for queries with or.
It sucks to write 2 selects with an union which is 100x faster than the same query with one select and ONE or.
I have been curious to ask someone who is knowledgeable about databases/MySQL: Spanner has a feature for creating interleaved tables, which helps data locality when performing joins on a distributed database. [1] Is this an important feature for a large database, and does MySQL provide anything similar?
This is really about database semantics. If you think database semantics matter, you'll almost certainly prefer postgres.
A lot of people simply don't care about database semantics though. Possible reasons:
* They are more interested in a particular application (e.g. wordpress), and use whatever default database it seems to prefer.
* They are more interested in a particular framework/library (e.g. django), and use whatever database it seems to prefer.
* They don't see any reason a database should have interesting semantics, because it's just a place to stash data until you need it again.
* They really care about programming languages, but see SQL as a second-class citizen that looks archaic and is really just an obstacle to work around using cool language features.
I believe database semantics do matter. It's hard to make a single argument that will convince everyone, but here are a couple things to consider:
* SQL catches errors that may be impractical to catch even with haskell. For instance: you could have had a bug in the application three versions ago that lead to an inconsistent database state (e.g. a duplicated item ID), and you are just finding the bug now. SQL to the rescue: if the unique constraint exists, then you are fine, and the error would have been raised three versions ago before it ever got in the database.
* Few languages have anything that comes close to ROLLBACK. Erlang kind of does, in the sense that it's a mostly-functional language and a process crash mostly resets you to a good state. But the guarantees of ROLLBACK are much stronger and clearer. I believe this is actually why erlang never really took off widely -- a PHP script hitting the database is a better version of the same concept: the PHP code can crash, and that will cause a lost connection and a ROLLBACK in the database, getting you back to a known-good state and the other requests just continue working (security and other PHP jokes aside).
Rambling a bit, but I believe postgres really is the leader when it comes to database semantics, and I believe that is one of its most important contributions.
> I believe this is actually why erlang never really took off widely -- a PHP script hitting the database is a better version of the same concept: the PHP code can crash, and that will cause a lost connection and a ROLLBACK in the database, getting you back to a known-good state and the other requests just continue working (security and other PHP jokes aside).
Comparing Erlang to PHP + transactional DB is regrettable.
Firstly, Erlang's "let it crash" philosophy, together with its sophisticated supervisor/worker framework, aims for resilience in the face of failure due to bugs or bad data. A PHP script hitting a database may be a better design for a simple web application, but it is in no way superior to the Erlang/OTP ecosystem for Erlang's primary use case: soft real-time, highly concurrent, resilient and robust systems. It cannot reasonably be compared to a scripting language + DB, nor does it make any sense to.
Secondly, Erlang is also able to work with transactional databases, from its built-in Mnesia DBMS to Postgres and more. A more appropriate comparison would be between Erlang + DB vs PHP + DB, and then I think you will agree that Erlang would generally fare better than PHP in that case.
WhatsApp was built almost entirely in Erlang + Mnesia. I honestly don't think they could have done it in PHP + Postgres.
Finally, claiming that Erlang didn't take off because it lacked the transactional semantics of PHP + DB does not compute, because I never saw any claim that Erlang offers transactional DB semantics (unless it happens to be using a transactional DB).
I think you're right, but I also think that you're missing the point of the GP's argument.
Most people aren't building WhatsApp, they're building a CRUD app. And PHP has very Erlang-like "let it crash" semantics if you squint your eyes: one request causing havoc will not affect others at all, especially if you use a transaction to control DB side effects. Compare this to a NodeJS server that fully terminates when one request causes an uncaught exception and you'll see that PHP and Erlang have more in common, philosophically, than people like to admit.
So the way I read the GP is that despite its many safety guarantees, Erlang never got popular because PHP offers the most important of these guarantees in a more accessible way.
In fact, I know of no language that offers better crash resilience and isolation than PHP. I can run 100 shared hosting customers' shitty PHP code on a single Apache server and they'll have a seriously hard time making their bugs impact other customers. Try that with Erlang :-)
Hmmm.. Interesting. On the last point I disagree unless you are launching php process per request then you'll have to set mod_php or php_fpm recycling ridiculous high to cope with 100 customers and their shitty PHP code (or the shitty PHP code from the top 10 one-click solutions installed) leaking memory everywhere. On the isolation side the industry answer to this is nasty prop solutions like CloudLinux and other custom solutions where each customer essentially has a request pool (e.g 5 php-fpm waiting around under the uid of the user) - counter to most shitty PHP web hosting providers goal of density, 'lets keep this server lightly loaded for the highend packages - just put 5,000 account on it okay'. Typical PHP hosting is not pretty, they're about 10 years behind the times, cough cpanel.
"And PHP has very Erlang-like "let it crash" semantics if you squint your eyes: one request causing havoc will not affect others at all, especially if you use a transaction to control DB side effects."
Exactly my point, thank you for clarifying. PHP crashes (in some form or another) all the time, but the application can still remain up because that PHP process only has state for the one request (kind of like a single erlang process having state for only one request).
> WhatsApp was built almost entirely in Erlang + Mnesia. I honestly don't think they could have done it in PHP + Postgres.
They could have built it in anything. Federating and sharding message delivery has been a solved problem for decades.
Their tech stack choice certainly has server cost implications, so that does not mean I'm saying they should have just arbitrarily picked something.
But in terms of engineering a solution there's simply nothing particularly hard about their messaging volumes - people do volumes like that all the time. Just not that many that do it between people.
That's not to say the Whatsapp team is not impressive - to have set up and run a system like theirs with that few people is impressive, and it's very much possible Erlang + Mnesia was instrumental to that.
I'm sure they could have built it in assembly language, it is true. But they would still be developing it today and not be close to done.
All these arguments - not just yours - are strawman arguments, because they avoid addressing my actual objection: there is virtually nothing in common between PHP and Erlang, other than them both being programming languages. The comparison is entirely fallacious.
The fact that PHP can survive crashes without affecting other sessions is either due to the process isolation that you get for free in modern operating systems, or using external technology like HHVM, which is nothing like the Erlang BEAM engine. To quote Robert Virding:
> Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.
Lest anyone think this to be arrogance, I'll quote Brian Acton:
> Erlang is a generally good and useful general purpose language. There was serious thought and consideration that went into its construction. As one example, we’ve seen great benefits in high concurrency situations. We’ve also seen the ability to maintain great uptime as part of its hot code loading capabilities.
Doubtless many high-volume messaging systems have been built. From my experience, it is also likely that the effort and defect density involved to do so was likely 4-5 times greater than it would have been to implement it in Erlang. This has been borne out by experiment[1] and the experience of many developers who have adopted Erlang after using other technologies.
This is not to claim that Erlang is better than everything else, or a panacea. Like anything created in an imperfect world, it has imperfections. But for someone who has written servers that require high reliability, robustness, concurrency, uptime, and maintainability, one could not do much better than Erlang.
Sadly, HN comment section is not the place to debate the finer details of competing language ecosystems, and much of this is in the realm of personal experience and opinion, so let me gracefully withdraw and thank you for your perspective.
I think this overcomplicates the problem. I have not written a high performance messaging system in PHP, but I have in both C and Ruby, and the quip about a "sufficient complicated concurrent program" was moot in both cases, because
1) there are plenty of approaches to doing this cleanly without writing a "complicated concurrent program" - here's one (I've done this in production; it worked great and all the work on federation has been done for you): use qmail or the qmail approach, of decomposing into small independent programs, and of which case crash and get restarted without causing any damage. Incidentally, yes, this depends on process isolation just like you'd expect with PHP, and yes, this is run of the mill stuff you can do with anything. That's the point - Whatsapp like message exchange is not rocket science, and how to structure this so you can just let processes die is something we solved literally decades ago.
2) even if you write a concurrent program to handle the queuing, the logic is simple enough that e.g. my first production queuing messaging system in Ruby was <700 lines of code, with ~10 lines protected by a Mutex making up the only place where the threads interacted with each other at all.
I'm sure there are many spaces where Erlang could make a huge difference here. And as I said it's very much possible it'd have a performance impact, but this specific example isn't a complicated problem space. At all.
"there is virtually nothing in common between PHP and Erlang, other than them both being programming languages. The comparison is entirely fallacious"
I just disagree. When I was reading an erlang book for the first time, that was one of the first things I thought: this sounds like the architecture of PHP working against a database.
Hot code loading? I do that all the time -- just FTP the file, and the module will see that it changed and run the new version.
Let it crash? Sure, PHP can do that too -- one runtime exception on unexpected input doesn't affect the other requests at all.
By the way, I happen to like erlang and find PHP unpleasant to use and scary from a security standpoint. But I think a lot of people underestimate how good the PHP architecture was in a lot of ways.
Reading an Erlang book, no offense, is not going to give you the insight necessary to do a competent comparison of Erlang and PHP (or any other language). I am not sure if you have written any significant production systems using Erlang, and if you have, I apologize and withdraw the implication that insight is lacking.
Please let me explain why this makes me raise an eyebrow.
Let's take just one thing: hot code loading.
When Erlang does hot code loading, it loads it into the currently running process. This means that any socket opened by that process stays open, and the connection is uninterrupted. The process doesn't stop running, or restart. In fact, the process is, for a short time, running both the old and the new code at the same time. When the process gets to the point of completing the recursive loop within which the old function was called, the new one gets switched in seamlessly and is called from there onwards. I know of no other language that has such mutable late binding, but I can't say they don't exist.
This is, I believe, entirely different from an interpreter loading new code into a different process (or restarting the current one) on detecting a new file. If you can assert that after loading new code from a file,
1. The process pid is unchanged (it's literally the same process), and
2. The process data state is unchanged (it's literally the same process memory) and
3. The process file handles and sockets are unchanged
"If you can assert that after loading new code from a file, ..."
You are arguing about technical details, and I am arguing about architecture. The only way you will agree with me is if you zoom out a bit, and look at it from the developer's point of view:
* To deploy a bugfix, did the developer have to take the system down for maintenance and/or interrupt service? The answer for both PHP and erlang is "no".
* Does a runtime exception/crash bring down a large part of the system, or is it isolated as much as possible (often to a single request)? The answer for both PHP and erlang is the latter.
Those are two of the biggest claims erlang can make, and PHP can make them, too. Either the erlang community has a lot of difficulty communicating the advantages, or they aren't quite as unique any more (though perhaps they were at one time).
Now, erlang does have it's place. It's great for handling arbitrary protocols and getting a reliable, performant network daemon going quickly[1]. The binary pattern matching is awesome. The erlang shell allows you to have a point of entry to find out what's going on in a system while it's running. Really awesome stuff.
[1] I wrote one such daemon, replacing a bunch of python and finding a lot of bugs in the original along the way. The erlang version was about 50% as many lines of code, even though I wrote out all the typespecs and had to reimplement some things that the python version got for free. But the service was tiny, so you're right, I'm not an erlang expert.
> When Erlang does hot code loading, it loads it into the currently running process.
That is also usually true for PHP code using Fastcgi or mod_php or similar. It's not true for PHP code using PHP as a CGI, but nobody has done that since the 90's. It may depends on settings for opcode caching and the like.
> This means that any socket opened by that process stays open, and the connection is uninterrupted.
One of the earliest optimisations for PHP was the ability to make database connections persist across requests. See [1] for documentation of how to do this in current versions of PHP but it has been available pretty much from the start.
> When the process gets to the point of completing the recursive loop within which the old function was called, the new one gets switched in seamlessly and is called from there onwards. I know of no other language that has such mutable late binding, but I can't say they don't exist.
Plenty do.
This is valid Ruby that replaces the method "foo" for each iteration of the currently running loop:
(1..10).each do |i|
eval(<<END)
def foo
puts #{i}
end
END
foo
end
(the textual eval rather than simply inlining the method definition is necessary because "i" is a local variable not accessible from within the newly defined "foo" - if we used e.g. an instance variable instead it'd work, but then foo would effectively be static which would defeat the point of the demonstration)
Ruby's and PHP's ability to do hot reloads is similar, but it needs the running script to cooperate in the loading unless you use a containing process that embeds the interpreter and does hot loading that way.
But the PHP approach there is traditionally instead to throw away the code on each request.
> 1. The process pid is unchanged (it's literally the same process), and 2. The process data state is unchanged (it's literally the same process memory) and 3. The process file handles and sockets are unchanged
To me these are optimisations that are undesirable unless you need them. They're necessary for performance, but they make the system more vulnerable to working on corrupted data. As such I don't think it's a good distinction, but as mentioned depending on how you run PHP it can/will meet 1. and can meet 3. to some extent, and will meet 2. for session data, so it's up to you how much persists across requests.
To reiterate, I am saying only that it is unfair to Erlang to portray PHP's (or Ruby's) operational characteristics as "like Erlang". It may be true in the loosest possible interpretation, but the details differ radically.
Your argument dismisses as optimizations things that are critical to systems for which Erlang was designed - telecoms. They are not optimizations for Erlang - they are necessary features that permit "write once, run forever" style operation.
Keeping a connection open and available in a database pool is admirable, but not even remotely comparable to keeping a TCP socket open between two participants in a phone call, with RTP voice data flowing, and changing a function - at run time - in that very same process, without disrupting the voice data.
The Erlang-style hot code loading doesn't make an Erlang system more vulnerable to working with corrupted data by not killing the process. That's just incorrect. In any case, well-written Erlang code will self-terminate if it detects anomalies. And it's harder to corrupt data in Erlang because of its immutability, absence of global data, and process isolation.
Compare the rather ugly and unsafe Ruby code with the equivalent Erlang code (provided as a separate module):
What the Ruby code does is in no stretch of the imagination "mutable late binding". It is evaluating (and hence interpreting) static text, which happens to redefine a Ruby function, every time through the loop. That's equivalent to recompiling a piece of code in a loop.
Where does this load new code?
You can eval code in practically any interpreter, from Perl to Python, and it's an unsafe and slow operation and not remotely the same as hot code loading in the Erlang sense.
The Erlang code is compiled, usually on another system, and the compiled BEAM code is copied to a live server, and loaded into memory into every running process that uses that module, simultaneously, without crashing or restarting any of the processes.
In fact, if there is a cluster of servers, and the replacement code (let's call the module "athana") is copied to the disks of all the servers in the cluster, it can be loaded into all processes running "athana" on all connected nodes of the cluster by entering this one line in the Erlang shell of any one of the production cluster nodes:
> nl(athana).
That's it.
Please, let's stop this fruitless debate. It is obvious that we are not going to agree, and we are using the same phrases to mean completely different things.
> WhatsApp was built almost entirely in Erlang + Mnesia. I honestly don't think they could have done it in PHP + Postgres.
WhatsApp backend is a heavily customised ejabberd - an XMPP server written in Erlang. Sure, they could have picked one of the other XMPP servers in a different language (I know of at least servers in Java, Lua, C).
So, I would say (as a developer who uses PHP in some situations) that while PHP is completely the wrong choice for an XMPP server, that isn't the main reason why WhatsApp used Erlang - they used it because they used an existing open source project as their base.
I doubt WhatsApp chose XMPP first and then ejabberd/Erlang. Even if they had built a custom protocol, the massive-scale concurrency required would have made Erlang just as good a target to build on. Java could have come close, but Erlang was the language with first class Actor-model and native massive-scale concurrency support.
Actually, they do use a custom protocol, one that heavily derives from XMPP. Turns out, they didn't need to build a backend from scratch in their chosen language because a similar one already existed.
Without inputs from the WhatsApp engineers on their early choices, my version of things^ is at least just as valid as yours.
You think it's just a coincidence that WhatsApp, FB messenger, Google Talk, EA Origin, PlayStation, HipChat, Cisco WebEx all use(d) XMPP for their IM/chat/VoIP?
Just because it's open only to a single vendor's client and isn't federated, doesn't mean it isn't still basically XMPP.
Maybe. But I think with much greater difficulty. None of Java, C#, or Go has the built-in distributed processing capability of Erlang.
The last time I looked into it, WhatsApp's infrastructure consisted of 256 2-node clusters, each node of which was capable of supporting between one and two million concurrently open TCP sockets. The 2-node clusters were combined into (IIRC) 16 "islands", some of which were geographically remote, and each of which communicated with the other using a distribution protocol written by WhatsApp in Erlang.
If you agree with my point then it's not hard to see how postgres is more friendly:
You can write a function in python (or a number of other languages) and share code with your application, and use those functions almost anywhere in postgres (including unique indexes). That makes the unique indexes much more powerful, especially if you want to adapt to changes without rewriting all your data.
Or, you can use range types and an exclusion constraint to form a non-overlapping constraint that prevents schedule conflicts:
There are tons of examples and others list them all the time. The problem is that a lot of people read past them because they don't see why semantics matter.
I didn't actually say I agree with you. I said I get your point.
My point was that you admitted a single argument won't convince everyone that database semantics matter, and then went on to list two features that MySQL handles just fine, regardless of how well it handles the other things you care about.
You may as well have said "a pet badger is better than a pet bulldog in so many ways, for example it will attack/chase home invaders".
Edit: I have no actual idea if you can have a badger for a pet, or how effective they really are at home protection but I imagine they'd be badass at it.
My reasoning is something like: "if these simpler database semantics matter, then maybe you should look at the more sophisticated semantics postgres has to offer".
If you get my point but find flaws in my argument, well, I can live with that. Counterpoints welcome.
You can't compare SQL to Haskell like that. There are multiple b-tree implementations for Haskell that of course will not allow you to insert duplicate keys.
There is even a real relational database system in Haskell that actually obeys the laws relational algebra, unlike SQL.
If you think haskell would make a good database language, you're probably right. It's a lazy, statically-typed, type-inferred, declarative language with tightly-controlled side effects -- just like SQL.
But in its current form, Haskell is just not a DBMS. So, it must rely on a DBMS like postgres or mysql, and the semantics of the DBMS matter very much.
And yes, I can compare SQL to Haskell, and I did a while back:
An "algebraic data store" would be a very interesting piece of software. Bonus points if it's lazy and if you can store functions there too.
Haskell creates some problems for joining different transactions into the same set of guarantees (like SQL does in optimist data sharing). But I'm sure there's some extension somewhere that makes the perfect data type for that possible - I just don't know about it.
MySQL has been a toy compared to PostgreSQL for many years. This isn't remotely about being pure or niche, but about practical database concerns. Why have Oracle DB, MS SQL, and DB2 been profitable? It's because serious database features are genuinely important. MySQL doesn't cut it.
We use Postgres over MariaDB/MySQL because of working CHECK constraints, Common Table Expressions, extension support, JSON support, custom types and its great documentation.
There's an alarming list of behavior MySQL got negligently wrong at some point. That code is mostly still present (you can't prevent a client from enabling it!) and I have to assume some of the maintainers who thought it was reasonable are still involved.
And not everything on the list has a fix available. Off the top of my head, there are still broken storage engines that can't rollback, not all of them can be disabled, and STRICT_ALL_TABLES won't stop you from writing to them. Some foreign key constraints are supported while others are parsed and then silently ignored.
Maintainers often have a perspective that rates backward compatibility higher priority than an individual user would. Even if they have learned from previous mistakes in policy or implementation.
I've used Postgres for years because it's sane out of the box. I know you can configure MySQL to mitigate these issues and use InnoDB by default, but the mere presence of data-destroying features is a red flag to me.
I know I could put the logic in my application, but I like the certainty that comes with knowing what I put in is what I get out.
This is one of the many reasons I prefer postgres, but it's worth noting that even in postgres, there are some DDL statements that cannot be executed within a transaction. I can't remember the list, but they do exist - it's just a far smaller number compared to MySQL.
(This bit me recently, because our database migration framework (correctly) puts transactions around migrations, and PostgreSQL (correctly) fails with an error in the case you try to do an operation in a transaction which can't be done in a transaction. (I say correctly, as opposed to Oracle/MySQL which will silently do commits etc.))
I have been in this situation. I then wondered if it was indeed right for my database migration tool to forcibly wrap migrations in transactions and not let me override that behaviour.
I do recommend to read through the PGSQL manual (thick PDF file but you can work through the important parts in a week), it's a joy to read compared to the MySQL one.
Postgres supports a sophisticated full text search featureset while mysql does not. Implementing full text search on Postgres is not that difficult and can prove to be a much cheaper alternative to Elastic Search up to a certain scale. This was a major reason for me to recently pick Postgres over any other storage technology.
It is also: another server to handle, another view of your data you have to manage, another point of failure and another security entry-point.
If you don't need all the functionality of ES you can go a long way with the Postgres fulltext search. Once you scale enough or if search is your core business ditch it and deploy a better solution like Elastic Search.
Exactly. When getting started initially, it can be a huge time savings to get your basic full text experience running in Postgres directly, if that's already your main storage DB. Setting up ES is not at all trivial for the average developer and can waste a lot of precious time too early in a product's lifecycle.
I have found Solr to be more reliable than ES and faster than Postgres for FTS. It is cool to be able to do it in-database though. Solr is a bit more work to configure than ES though.
I use PostgreSQL but are not sure it was the right choice. In MySQL it is easy to set which collation to use for a query to get the correct order in the users language when using order by. In PostgreSQL it depends on the locales installed on the OS making it much harder to get a correct order by and requires more work when setting up the servers
PostgreSQL 10 will have support for ICU collations which should be much more stable across different installations. Since it is the first version it could have some rough edges, but I suggest giving it a spin.
For me this is the last major "missing piece" for postgres. MySQL's utf8mb4_unicode_ci makes it possible to order results containing a mix of any languages. With postgres you are forced to pick a single language. It's impossible to use postgres for a truly multi-language application, where you are presenting data in (for example) English, German and Chinese on the same UI.
Yes, MySQL's ordering of languages within unicode may be arbitrary, not necessarily very meaningful, and in some cases even "wrong" - but at least you can do it, and the results are consistent and usable. Postgres is outright unusable in this scenario.
> MySQL's utf8mb4_unicode_ci makes it possible to order results containing a mix of any languages. With postgres you are forced to pick a single language. It's impossible to use postgres for a truly multi-language application
If you know of a solution, please share. As far as I know, it is not possible to do multi-language unicode sorting in postgres. The idea being you have a utf8 column with some rows having English, while others have German, Chinese, Japanese, etc. Or an individual column containing mixed language, such as English and Chinese in a single string. And of course, the real deal - a consistent sort for all unicode points, including emotes and other non-language code points. In postgres you're stuck with a single-language collation such as en_US.utf8 - there's no multi-purpose "utf8" collation.
For me, personally, it was because, until very recently (July 2017) PostgreSQL had CTEs and window functions and MySQL did not. I worked on small/medium size apps so performance wasn't an issue and having these features made my life easier as a developer.
In MariaDB 10.2 and MySQL 5.7 and above, this wouldn't happen. It's FUD. The sql_mode, and mysql/mariadb version is not specified.
This seems purposefully contrived; just another developer who throws their toys out of the pram, because they don't read the documentation and just expect software to work for them.
If anything the author has done nothing more than demonstrate ignorance, malice, or both.
When choosing a database, of any kind, you are choosing where to store precious data! Read the fine manual!
MariaDB, MySQL, PostgreSQL, and even the likes of MongoDB offer so many diverse options and storage formats and retrieval options.
It's a very very serious downside for me that there are now two MySQL's (three if you count Amazon's Aurora), just like there are multiple Linuxes and two Pythons.
Isn't the number of pythons four? One for C, one for JVM, one for CLR and one in pure Rpython for the pure JIT code. Double them for python 2 and three. That makes it 8.
There are two (very closely related) languages. In principle implementations are compatible, whereas language version are explicitly incompatible by intention. I think that was clear from the parent.
Having worked with MySQL for years these are all legitimate issues but, as other users have said, they can be managed via upgrades and settings.
Personally, I'm less interested in developer-oriented issues (which are well documented) and much more interested in how people feel they compare in terms of performance and scaling.
Until recently (~5 years I guess, possibly longer now) MySQL beat PostgreSQL in raw read performance and scalability - though you'd have to scale MySQL sooner. PostgreSQL has been better than MySQL for high-write workloads (thanks to MVCC, if you tune autovacuum properly) and for more complex queries as the query planner understands the storage layer.
Modern PostgreSQL has a much better story around horizontal scaling thanks to streaming replication and a lot of work from the Citus Data team. Query performance is greatly improved as well.
There will be any number of reasons to favor PostgreSQL over the DolphinDB and vice versa.
I have owned various on linux, aix, windows, sco-unix, vax from pick, rms, isams, 4gls dbs, ingres, informix, sybase, mssql, db2, oracle, solid, postgres, mysql to mongo, influx, and redis. I hate them all. I found mysql very slow, much slower than postgresql on equivalent datasets/queries. I had done evaluations before choosing an open-source db for a particular project.
And that really is the crux of the thing - comparing benchmarks is usually meaningless, some don't scale as well, some suit a different kind of application, some need to be distributed, some not, it really is on a case by case basis, what is good in one case is not always the case.
Perhaps stating the blindingly obvious for those getting very het up about evangelism...
As time goes on this is become a less and less controversial opinion. There are still some specific use cases which favour competing RDBMS, but the majority of people who choose competitors at the moment is for reasons tangential to functionality and performance.
SQL is SQL is SQL. I have no problems jumping between different flavours of SQL, however what defines my use of them in my own projects is the toolsets I have for managing and maintaining databases.
Over the years, I have built up a collection of tools, drivers and utilities that just makes working with MySQL an absolute breeze across Windows and OS X. There are times that I want to use ODBC to dump a MySQL table or view into Excel/Access for a client, or copy data from a production system to a test system, or set up replication or do some performance reporting, and I like that I have tools set up which can do these things for me at literally the push of a button.
Sure I still use the CLI for MySQL a lot of the time, but nowadays, 75% of my time is spent in GUI tools of one sort or another.
If someone can give me an equivalent suite of these sorts of tools for Postgres that can let me be productive in short order, then I don't much care if the SELECT statement of one system is several milliseconds faster than the other. I don't deal with such huge volumes or mission criticality. I just want ease of management/maintenance.
Last I checked MariaDB doesn't support common table expressions which are part of the SQL standard published in 1999, nearly 20 years ago. Has this changed?
If SQL is SQL is SQL as you say, then I would say MariaDB doesn't support SQL.
Also I think it had it's time, great tool in the early days of web dev but not very professional, sure it was improved in recent years but why put so much energy in it instead of other projects like MariaDB or PostgreSQL before, especially now that it's owned by Oracle? I'm amazed by webdev discussing languages, frameworks and the best stack endlessly and then spawning a MySQL without thinking and looking at alternatives like it was 2004.
Are you sure the webdevs discussing every language and the webdevs spawning MySQL are the same people?
I mean, speaking purely about languages, there are lots of webdevs who just love PHP and will never consider an alternative to it, even if you make a persuasive argument that PHP is unsuited to the use case they're writing for. It's the same sort of "like it was 2004" mentality in a slightly different context. I actually wouldn't be surprised if that's what it is, because those are the M and P in the LAMP stack that everyone seems to have started learning at around that time.
I don't mean to pick on PHP here; it's just the one I see the most that has this property.
The company that wrote it is a Postgres consulting firm so take it with a grain of salt.
Kind of like an AngularJS consulting firm that took thousands of hours to become an expert in Angular writing an article about how horrible they determined ReactJS is after 3 hours with it.
Edit: With that said, I've been a mySQL user for years (decades?) and it does have some obvious short comings that PostgreSQL does not. I often times consider switching but the thing is that in 90% of my projects the database engine is not relevant.
TL;DR a couple of examples where pgsql doesn't silently coerse data values. How to choose one would be a worthwhile article. If someone has a reference to such, a link would be appreciated. The comments in the thread are great starters.
I like the fact that MySQL is so light on features that it forces you to put more of the burden on your application.
With richer RDBMSes like Oracle, MSSQL, and Postgres, people tend to throw too many responsibilities on the DB--which is not so good in the one part of the system that depends on global locks.
That, and the article is kind of a half-truth when it comes to the MySQL warts. A big one he missed is the utf8_unicode vs utf8mb4_unicode fiasco, which is still a thing to the best of my knowledge...
I'm not sure what you're getting at. Good RDBMSs do not have any global locks, table level at most, and can easily handle thousands of reads and writes per second on modern hardware while maintaining isolation between transactions.
Once you approach the limits of an RDBMS you're in really large application territory and very few of us will be working for a Facebook or Google.
You can make a mess out of any development system, and a knee-jerk reaction to throw out RDBMS guarantees is as sensible as ignoring Javascript because legacy jQuery made the UI really hard to understand and debug.
I've worked on some large apps and almost always had separation of concerns - the database is ultimately responsible for data integrity. It will validate foreign keys, data ranges and types, but does not implement write business logic. This takes advantage of the amazing work put in to query planners but avoids most of the magic that triggers cause.
Where did I say throw out RDBMS? MySQL does fk/pk restriction, transactions, field enforcement (contrary to article), and all of the other good ACID stuff just fine.
I'm complaining about things like PL/SQL letting you send an email inside a trigger (http://www.orafaq.com/wiki/Send_mail_from_PL/SQL), or Postgres's XML processing features. I don't think that belongs in the DB.
It's easy to say "don't do that" to a small team, but when you get to the corporate shops that do Oracle, that becomes a much harder thing to enforce.
There can be more to data integrity than foreign keys and transactions.
We have an application to track and distribute work to employees, there can only be one active work item per item type on an account - we use MSSQL for this and originally enforced this with a CHECK constraint, which ultimately failed, we switched to a partial index (CREATE UNIQUE INDEX ON activities(account_id, activity_type) WHERE activity_state = 0) which works flawlessly. There is absolutely no sane way to implement this at the application layer, and even then we have additional applications and administrative work that may access this database directly so we can't rely on the application enforcing data integrity anyway. MySQL doesn't even support CHECK constraints, nor does it have partial indexes - while even SQL Server has supported partial (filtered) indexes since SQL Server 2008 (nearly a decade after PostgreSQL implemented them, but hey, still at least a decade before any of the MySQL family will have them).
Should you put application logic in your database? No, probably not. But you should always make sure your database prevents ANY invalid data from being put into it, your little application won't be the only one touching that database forever in all likelihood.
Ha...that's a bit of a hack! I've had to do a similar trick for a unique constraint on columns where we didn't care if the value was null. SQL Server (for some reason) considers null a value in this case; in other words, there could only be one null value in the table for purposes of a unique constraint.
E.g:
create unique index on (col) where col is not null
Stuff like this always makes me glad PostgreSQL takes adherence to the SQL standard seriously, NULLS are supposed to be considered non-equal for unique indexes - probably some stupid design from the Sybase days that MS decided was easier to keep for compatibility than change, though.
b) just the fact that you're (rudely) giving me the religion on CHECK constraints means you've probably never actually scaled a DB for high transaction volume. Once you have to start sharding and denormalizing for performance, we've left ACID territory, and most of the validation has to be in the application anyway.
You can write unmaintainable code anywhere in your stack. The problem I routinely see with approaches similar to yours is the app code ends up poorly recreating functionality that the rdbms was built to do.
The data should be handled by the database, especially when there might be many different applications accessing it. Think of microservices, maybe written in different languages and frameworks, plus some devops running queries by hand on the db.
The ability to stop modifications inside the db is the ultimate safeguard against data corruption. If you really value your data you should also implement validations with CHECK statements and not only in the application. Think about the database as another microservice worth its own logic.
You are correcting something I never said. MySQL has primary/foreign key checking, field validation, transactions, etc. I have nothing against RDBMSes. I like them!
XML parsing definitely belongs into a major database. There’s a lot of legacy tools using XML databases, and supporting them is useful. (And they are not technically bad either)
This. You don't want to reimplement all your validation in each of your applications (plural!), and you're in for a nasty surprise if you think your entire organization has only one application using your database now and forever.
Back end API (any web framework) for the convenience of developers, the users of the API.
The database to make sure nobody can corrupt the data.
In my experience the validations in the database almost never go past foreign keys, null fields and bounds on numeric data.
Anecdote. I remember when the consensus of Rails developers was not to use foreign keys: Rails didn't have a way to define them so they probably didn't matter, right? I laughed and added them to every application of mine. Rails has foreign keys now, maybe since version 3.
There are some gems to generate CHECK statements from validations.
Speaking of comparing MySQL and PostgreSQL functionality, I've been spending some time recently on this very topic. Both are great databases with interesting and sometimes unique features. Here are some of the features PostgreSQL is missing as compared to MySQL:
- synchronous multi-master cluster (Galera, Group Replication)
- distributed in-memory grid with automatic sharding (NDB)
- semi-synchronous replication (will be available in PostgreSQL 10)
- built-in logical replication (will be available in PostgreSQL 10)
- built-in event scheduler
- clustered indexes
- declarative partitioning (limited functionality will be available in PostgreSQL 10)
- optimizer hints
- optimizer tracing
- efficient MVCC implementation that is unaffected by XID wraparound problems and VACUUM issues (see also Uber's report on moving from PostgreSQL to MySQL)
- page-level compression
- page-level encryption
- page-level incremental backups
- user variables
- NoSQL client APIs (HandlerSocket, memcached protocol, X protocol)
- virtual columns
- ability to choose data page size on database creation
- ability to specify column order when adding a new column with ALTER TABLE
- write-optimized storage engines similar to MyRocks and TokuDB
All of that may seem small from a developer's standpoint. Those things, however, are huge for people who design and operate the busiest websites on the internet. Which is probably why MySQL still shines in that field.