> There's no reason for SELECT COUNT(DISTINCT x)) to perform badly in Postgres, ...

merb · on May 10, 2016

Actually that's wrong:

    EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
                                                          QUERY PLAN                                                           
    -------------------------------------------------------------------------------------------------------------------------------
     GroupAggregate  (cost=10713.04..11381.06 rows=10 width=5) (actual time=1010.383..1073.263 rows=11 loops=1)
       Group Key: calc
       ->  Sort  (cost=10713.04..10935.68 rows=89056 width=5) (actual time=1010.321..1049.189 rows=89041 loops=1)
             Sort Key: calc
             Sort Method: external merge  Disk: 1392kB
             ->  Seq Scan on price_history  (cost=0.00..3391.56 rows=89056 width=5) (actual time=0.007..20.516 rows=89041 loops=1)
     Planning time: 0.074 ms
     Execution time: 1076.521 ms
    (8 rows)

With Index:

    EXPLAIN ANALYZE SELECT COUNT(DISTINCT(calc)), calc FROM price_history GROUP BY calc;
                                                                         QUERY PLAN                                                                          
    -------------------------------------------------------------------------------------------------------------------------------------------------------------
     GroupAggregate  (cost=0.29..2804.82 rows=10 width=5) (actual time=0.117..47.381 rows=11 loops=1)
       Group Key: calc
       ->  Index Only Scan using price_history_calc_idx on price_history  (cost=0.29..2359.52 rows=89041 width=5) (actual time=0.054..18.579 rows=89041 loops=1)
             Heap Fetches: 83
     Planning time: 0.208 ms
     Execution time: 47.416 ms
    (6 rows)

Actually that is called a index only scan, and happens when you have a data type that is inside your index. Which means if you need a aggregate you could try to index everything you need. Mostly a aggregate only contains some values of a row so a index is mostly not a problem.

anarazel · on May 10, 2016

I didn't say an index couldn't be used at all. Just not to actually make the query fast. This will get all duplicates for a value from the index, before going to the next value. If you have a couple thousand or more of each to be counted value that'll make the query rather slow.

Jweb_Guru · on May 10, 2016

An index only scan isn't the same as a loose index scan. They're orthogonal tricks.