Hacker News new | past | comments | ask | show | jobs | submit login

The section on "dictionary encoding" didn't make much sense to me. Is there an explanation of this technique somewhere? The example code took a list called "buffer" and then didn't use it for anything - perhaps something got lost when writing the code?

The explanation was pretty opaque: Here’s one simple example I encountered recently. It’s common in columnar database code to ‘dictionary encode’ lists of values. Here, for some subset of the stored data, we store a dictionary containing the unique values that exist, assigning unique indices to each value, and we store a list of pointers into the list to represent each element.

We store a list of pointers into which list? The dictionary contains the unique values as keys or as values? How do we pick which subset of the stored data we do this for?




The buffer in this case is the input to a max() function; his point is simply that sorting the dictionary indices before looking them up is a good idea for common data structures like arrays, disk blocks, or anything with caching or readahead.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: