Hacker News new | past | comments | ask | show | jobs | submit login

HLL eliminates read before writes in many cases and that's great.Would love to see the same data structure in cassandra and PG.



In eventually consistent systems like Cassandra, HLLs have the ideal merge semantics too (very similar to union of a Set).


if by pg you mean postgresql it is available as an extension


I have a vague notion of what you mean but could you annotate that with an example or something, please? I'd like to make sure :).


What he means is something like non-reading-increments in hypertable.

How they function:

You write a+=1

  if it doesn't exist in memory:
    a=1
    append +1 to commit-log
  else:
    a+=1 (in memory)
    append to commit-log
After some time, 'a' is written to disk and the commit-log is checkpointed (so if a server crashes it doesn't have to read a very large commit log), and 'a' becomes immutable.

But you have to increment again the 'a' key, and it is immutable. So you create a new 'a':

And repeat again. After some time this is again persisted on disk and the commit log checkpointed.

Now you want to read the value of 'a':

If a merger has run, it reads different versions of data on disk and merges them, counters are merged and written as 1 key. So it reads 'a'.

If the merger has not run, it reads both versions of 'a', merges them in memory, and returns the value.

Now change '+1' to add_to_set(5). This is even better, because it updates the in-memory value, and if the hll doesn't change because '5' was already added to set, it doesn't even have to write/commit to log because no change is made.


Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: