Congratulations to the PostgreSQL Global Development Group on a much-anticipated...

olavgg · on Sept 29, 2016

I tested parallel queries on PostgreSQL 9.6 on a few TBs of data, 5 billion rows on an older dual Xeon E5620 server. I also striped 4 Intel S3500 800GB drivers with ZFS and enabled LZ4 compression which has a compressratio of 4x.

For a sequential full table scan I could process about 2000MB/s of data(only 125MB/s was read from each SSD), I was limited by CPU power.

Anyway, same query took about 25 minutes on PostgreSQL 9.5 and now it was down to 2minutes and 30 seconds. For comparison, SQL Server 2012 spent 7 minutes on the same dataset on the same hardware.

greggyb · on Sept 29, 2016

Would you be willing to re-run that with SQL Server 2016? A Dev license is free, and there's been a lot of relational engine optimization since 2012. I'd be curious to see what the latest release can do compared to Postgres' latest.

I realize I'm asking a stranger on the internet to do something for free for me. If you don't have time or inclination to do this, no worries, but it seems like you've got a nice setup to be able to play with this. I'm sure I'm not the only one curious to see such a comparison.

olavgg · on Sept 29, 2016

I've tried SQL Server 2016, no difference.

greggyb · on Sept 29, 2016

Thanks, this and your other response are very useful!

dhd415 · on Sept 29, 2016

I'm a fan of both PostgreSQL and SQL Server, but I think these numbers are very workload-specific. I've gotten 1GB/s throughput on SQL Server 2012 on spinning disks and CPUs older than the E5620, so I've no doubt that same workload would exceed 2GB/s on your hardware. The apples-to-apples comparison here is between the two versions of PG where the performance improvement is clear. It's harder to do an apples-to-apples comparison between PG and SQL Server because the optimal schema and queries for a particular workload are likely to differ for each of them.

olavgg · on Sept 29, 2016

With SQL Server I don't get 2000MB/s on the same hardware, more like 600-800MB/s. This is most likely because of LZ4 compression and large block sizes(64k-128k) on ZFS, that results in a lot less IO. Because with SQL Server, IO was the bottleneck.

So yes, it is very workload specific. For random read/write they are probably more similar. But for reading a lot of data that can be read sequentially, PostgreSQL seems to win hands down, because it can get a lot of help from ZFS compression.

I would love to run the same test when SQL Server is available on Linux. But ZFS do also deliver slightly better throughput and slightly more iops on the FreeBSD platform, which I ran this benchmark on. And SQL Server probably demands a 4k block size, which is so small that LZ4 compression has no effect as I've already tried to run SQL Server on ZFS via iSCSI.

dhd415 · on Sept 29, 2016

Hmm, I've never run SQL Server on anything other than NTFS where 64KB was definitely the recommended block size. In either case, it's great to have choices. When the license fee is coming out of my pocket, I'm definitely not choosing SQL Server.

greggyb · on Sept 29, 2016

No one has tested a query that got more than 32x faster, so they don't want to promise something they can't prove.

Jweb_Guru · on Sept 29, 2016

There's also a limited amount of memory-level parallelism available... with 4-DIMM sockets you might need an 8-socket machine to get a 32x improvement on large (memory-bound) sequential scans, which I'd guess you can get on top-end Power machines.

(You can probably get more memory level parallelism with random access, but your overall bandwidth will likely be lower... fully exploiting memory bandwidth is complicated and difficult to do for real applications).

joshberkus · on Oct 3, 2016

That's pretty much the case, yes.

zejn · on Sept 29, 2016

A blog post on 2ndQuadrant shows a bit more how parallelism in PostgreSQL scales across cores: http://blog.2ndquadrant.com/parallel-aggregate/

noxin · on Sept 29, 2016

They likely benchmarked it on a 32 core system. Like a dual Opteron board. If the task was single-threaded before a 32-fold improvement is reasonable.

pedrocr · on Sept 29, 2016

It's very difficult to get a 32x speedup from 32 cores as there are always parts that are inherently serial, so it's more likely they tested it on a 64 core machine or something like that.

jelder · on Sept 29, 2016

Yes, this is thanks to Amdahl's Law.

https://en.wikipedia.org/wiki/Amdahl%27s_law

ris · on Sept 29, 2016

Nothing ever scales linearly without limit.