> There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in Postgres, as long as you have an appropriate table design and indexes.
Meh. Postgres' planner doesn't know how to generate a skip-scan/loose index scan for DISTINCT. You can write it yourself, but it's a bit painful:
https://wiki.postgresql.org/wiki/Loose_indexscan
If you have a low cardinality that can be a huge efficiency difference.
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=10713.04..11381.06 rows=10 width=5) (actual time=1010.383..1073.263 rows=11 loops=1)
Group Key: calc
-> Sort (cost=10713.04..10935.68 rows=89056 width=5) (actual time=1010.321..1049.189 rows=89041 loops=1)
Sort Key: calc
Sort Method: external merge Disk: 1392kB
-> Seq Scan on price_history (cost=0.00..3391.56 rows=89056 width=5) (actual time=0.007..20.516 rows=89041 loops=1)
Planning time: 0.074 ms
Execution time: 1076.521 ms
(8 rows)
With Index:
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=0.29..2804.82 rows=10 width=5) (actual time=0.117..47.381 rows=11 loops=1)
Group Key: calc
-> Index Only Scan using price_history_calc_idx on price_history (cost=0.29..2359.52 rows=89041 width=5) (actual time=0.054..18.579 rows=89041 loops=1)
Heap Fetches: 83
Planning time: 0.208 ms
Execution time: 47.416 ms
(6 rows)
Actually that is called a index only scan, and happens when you have a data type that is inside your index. Which means if you need a aggregate you could try to index everything you need. Mostly a aggregate only contains some values of a row so a index is mostly not a problem.
I didn't say an index couldn't be used at all. Just not to actually make the query fast. This will get all duplicates for a value from the index, before going to the next value. If you have a couple thousand or more of each to be counted value that'll make the query rather slow.
Meh. Postgres' planner doesn't know how to generate a skip-scan/loose index scan for DISTINCT. You can write it yourself, but it's a bit painful: https://wiki.postgresql.org/wiki/Loose_indexscan
If you have a low cardinality that can be a huge efficiency difference.