I'm just wondering - where does PostgreSQL stand with those numbers vs MySQL? Di...

pgaddict · on March 5, 2015

That'd be an interesting benchmark, but I don't have the MySQL expertise to do that (and I don't want to publish bullshit results).

morgo · on March 6, 2015

I appreciate the straight-forwardness of this comment :)

One of the things that I think would be interesting to show, is how the consistency of performance has improved over time too. There has been a lot of focus on removing any stalls in recent versions of InnoDB, and this something that is incredibly valuable doesn't always show up in benchmarks :( I assume that PostgreSQL has gone through the same evolution in managing garbage collection, etc.

Peter Boros @ Percona has done some nice benchmark visualization, for example: http://www.percona.com/blog/2014/08/12/benchmarking-exflash-...

Slides on how to produce these graphs: https://fosdem.org/2015/schedule/event/benchmark_r_gpplot2/

(Disclaimer: I work on the MySQL team.)

pgaddict · on March 6, 2015

You mean how the transaction rate changes when things like checkpoint or vacuum are triggered in the background? Yeah, I do have those numbers, and the consistency improved a lot (especially when autovacuum was introduced in 8.1, and when spread checkpoints were introduced in 8.3).

The other part of the story are filesystems - back in 8.x days the filesystem was usually ext3, and that handled sync pretty badly (and that made the impact of checkpoints / autovacuum much worse). Today we have xfs/ext4 which do handle that much better.

morgo · on March 6, 2015

Checkpoint and vacuum are good examples. There are other stalls we have had to address: extending the size of tablespace files, dropping tables and scanning the LRU to free pages.

pgaddict · on March 6, 2015

PostgreSQL stores the data in a different way (each object gets a different file, or several files), so extending the size of tablespace files is not a big issue and dropping objects is as simple as deleting the file.

The LRU is basically the shared_buffers, and that's where most of the 9.2 improvements were (more or less).

pgaddict · on March 5, 2015

Actually, if there's someone with MySQL (or MariaDB) tuning skills, it'd be interesting to do a proper comparison. Might be a nice conference talk too, so let me know.

TylerE · on March 6, 2015

My general experience with the two systems is that when performing trivial queries (e.g. simple selects or updates that only hit a single table), MySQL is faster, but not enormously, and that gap can be narrowed with good tuning on the PG server.

As soon as you get into anything fancy, pg eats mysql for lunch. The query optimizer is MUCH smarter.

mdasen · on March 6, 2015

If people are curious, some of it comes down to the techniques and algorithms that each implement. This information might be a little out of date and I'm doing this from memory so please pardon/correct any inaccuracies.

So, with joins, MySQL uses loop joins. These are often expensive compared to hash or merge joins. PostgreSQL's query planner can choose which of the three join strategies will have the lowest estimated cost. I believe MariaDB has implemented hash joins. I'm sure there are cool visualizations of the different join types that you could watch in action and see how they'd perform differently.

MySQL (with InnoDB) uses a clustered (index-oriented) table layout. Basically, it keeps the table in a B-tree and so querying by the primary key becomes nice and fast since the physical data layout of the table is in that lookup order and successive rows are likely to be in the same disk block. But this means that if you're looking up something in a secondary index, you then have to traverse the primary index (going through two trees). Databases like SQL Server and Oracle let you choose your table layout, but MySQL and PostgreSQL only support one (MySQL giving you clustered and PostgreSQL giving you heap).

MySQL has supported covering indexes for a while, but PostgreSQL has only added this capability recently. A covering index means that you don't have to read data from the underlying table if you only need columns that are included in the index (since all of the data is in the index).

Ultimately with databases, the key is limiting the amount of time wasted waiting for the disk. If you can cut down on the number of disk blocks you need to fetch to fulfil a query, you win! There are lots of things like these that databases do to try and get better performance (some doing more, some doing less, and some that involve trade-offs) and they're often quite reasonable to understand.

mikeytown2 · on March 5, 2015

Some benchmarks put MySQL 5.7 at 645k QPS [1] and if you're using the MySQL memcache plugin [2] then you got well over 1m QPS [3]. Update performance is up to 100k QPS as well [4]. Note that in these benchmarks they are using some expensive hardware (40 core ht).

[1] http://dimitrik.free.fr/blog/archives/2014/09/indeed-mysql-5...

[2] http://dev.mysql.com/doc/refman/5.6/en/innodb-memcached.html

[3] http://dimitrik.free.fr/blog/archives/2013/11/mysql-performa...

[4] http://dimitrik.free.fr/blog/archives/2014/11/mysql-performa...

teraflop · on March 5, 2015

Even if the hardware was the same, it's entirely meaningless to compare numbers from two different benchmarks performing different workloads.

mikeytown2 · on March 6, 2015

I totally agree that benchmarks with different parameters are fairly meaningless. Was just trying to give people a ballpark figure in reply to the question about how to 2 compare with benchmarks; even if they are not equivalent.

One interesting point about this is that the PostgreSQL benchmark tops out at 8 users whereas the MySQL ones start around 8 users and go up from there. Right at the 8 user point, both SQL engines are within the same range, just under 100k QPS on read; with the hardware (# of cores) not changing this value much, as well as the MySQL version 5.5-5.7 not having any effect either. That seems like a limit that these 2 SQL engines (PostgreSQL & MySQL) are hitting; most likely from parsing the SQL. Thus the only way to improve single threaded performance is to use something other than SQL; like the memcache protocol.

Link to the benchmarking tool they used: https://launchpad.net/sysbench. Also a guide on how to use it: http://wiki.mikejung.biz/Benchmarking#sysbench

pgaddict · on March 6, 2015

IMHO the assumption that parsing the SQL is the bottleneck here is unfounded - I'd like to see some analysis supporting that first. The queries used in those benchmarks are extremely trivial, and there are far more expensive operations involved. I don't think it can be deduced from comparison of two quite different benchmarks (different workloads, different dataset sizes, different hardware, ...)

Also, relational databases are general-purpose query engines, not caching solutions like memcached, with very limited set of commands. It might be possible to invent a more efficient binary protocol to send queries/receive results, but then you still have things like optimization, planning, permission checks, ... that are usually more expensive than parsing the data.

sergiosgc · on March 6, 2015

It's almost certainly not SQL parsing that is introducing a serial component and an upper bound on parallelism. Actually, that is a task that is almost perfectly parallelizable. Parsing has no serial component, planning may have one if you somehow need locks on shared data structures needed for plan optimization.

MySQL and Postgresql are insanely different. To a point that it's uninteresting to compare them. MySQL is a keystore with an SQL frontend, which eventually got ACID features bolted on. Postgresql is a full-fledged RDBMS that eventually got refactored with performance in mind. While the end goal seems the same, the results are really different.

All of this text to say something which I think is common knowledge and is found interspersed in the comments here. For any kind of moderately complex query, Postgresql will eat Mysql's lunch. For any kind of concurrent locking, highly parallel, use case, Postgresql is much faster and/or safer. For any kind of very simple query, Mysql will eat Postgresql's lunch. It's a result of different upbringings.

If your use case can be satisfied with memcache, you aren't using a relational database and, yes, Mysql is the best choice, out of these two. Anyhow, in this scenario, today, there is a vast choice of good persistent keystores that are better than mysql.

pgaddict · on March 5, 2015

Exactly.