Hacker News new | past | comments | ask | show | jobs | submit login

Congratulations to the PostgreSQL Global Development Group on a much-anticipated release.

Curious about this:

> parallelism can speed up big data queries by as much as 32 times faster

Why would it be only 32 times faster? The sky's the limit if there aren't major bottlenecks on the way.




I tested parallel queries on PostgreSQL 9.6 on a few TBs of data, 5 billion rows on an older dual Xeon E5620 server. I also striped 4 Intel S3500 800GB drivers with ZFS and enabled LZ4 compression which has a compressratio of 4x.

For a sequential full table scan I could process about 2000MB/s of data(only 125MB/s was read from each SSD), I was limited by CPU power.

Anyway, same query took about 25 minutes on PostgreSQL 9.5 and now it was down to 2minutes and 30 seconds. For comparison, SQL Server 2012 spent 7 minutes on the same dataset on the same hardware.


Would you be willing to re-run that with SQL Server 2016? A Dev license is free, and there's been a lot of relational engine optimization since 2012. I'd be curious to see what the latest release can do compared to Postgres' latest.

I realize I'm asking a stranger on the internet to do something for free for me. If you don't have time or inclination to do this, no worries, but it seems like you've got a nice setup to be able to play with this. I'm sure I'm not the only one curious to see such a comparison.


I've tried SQL Server 2016, no difference.


Thanks, this and your other response are very useful!


I'm a fan of both PostgreSQL and SQL Server, but I think these numbers are very workload-specific. I've gotten 1GB/s throughput on SQL Server 2012 on spinning disks and CPUs older than the E5620, so I've no doubt that same workload would exceed 2GB/s on your hardware. The apples-to-apples comparison here is between the two versions of PG where the performance improvement is clear. It's harder to do an apples-to-apples comparison between PG and SQL Server because the optimal schema and queries for a particular workload are likely to differ for each of them.


With SQL Server I don't get 2000MB/s on the same hardware, more like 600-800MB/s. This is most likely because of LZ4 compression and large block sizes(64k-128k) on ZFS, that results in a lot less IO. Because with SQL Server, IO was the bottleneck.

So yes, it is very workload specific. For random read/write they are probably more similar. But for reading a lot of data that can be read sequentially, PostgreSQL seems to win hands down, because it can get a lot of help from ZFS compression.

I would love to run the same test when SQL Server is available on Linux. But ZFS do also deliver slightly better throughput and slightly more iops on the FreeBSD platform, which I ran this benchmark on. And SQL Server probably demands a 4k block size, which is so small that LZ4 compression has no effect as I've already tried to run SQL Server on ZFS via iSCSI.


Hmm, I've never run SQL Server on anything other than NTFS where 64KB was definitely the recommended block size. In either case, it's great to have choices. When the license fee is coming out of my pocket, I'm definitely not choosing SQL Server.


No one has tested a query that got more than 32x faster, so they don't want to promise something they can't prove.


There's also a limited amount of memory-level parallelism available... with 4-DIMM sockets you might need an 8-socket machine to get a 32x improvement on large (memory-bound) sequential scans, which I'd guess you can get on top-end Power machines.

(You can probably get more memory level parallelism with random access, but your overall bandwidth will likely be lower... fully exploiting memory bandwidth is complicated and difficult to do for real applications).


That's pretty much the case, yes.


A blog post on 2ndQuadrant shows a bit more how parallelism in PostgreSQL scales across cores: http://blog.2ndquadrant.com/parallel-aggregate/


They likely benchmarked it on a 32 core system. Like a dual Opteron board. If the task was single-threaded before a 32-fold improvement is reasonable.


It's very difficult to get a 32x speedup from 32 cores as there are always parts that are inherently serial, so it's more likely they tested it on a 64 core machine or something like that.


Yes, this is thanks to Amdahl's Law.

https://en.wikipedia.org/wiki/Amdahl%27s_law


Nothing ever scales linearly without limit.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: