Hacker News new | past | comments | ask | show | jobs | submit login

2 different things. This reduces the size of the HyperLogLog the other improves the accuracy.



HLL is a space/accuracy tradeoff. If you trim HLL to use the same space as if you were using HLL+TailCut but use the Ertl method for estimating cardinalities, which is more accurate: Ertl with a smaller `m` or HLL+TailCut with a larger `m`?


I think the latter. I will implement this next week. Nice call :D


I suspect you are right that HLL+TC will be more accurate. Based on Figure 5 vs. Figure 1 in the Ertl paper, I think Ert's method improves HLL estimation for low cardialities and for very high cardinalities, but not for the region where HLL is already at its most accurate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: