Autocomplete of entire Linux kernel using Redis's new ZRANGEBYLEX

Jabbles · on April 19, 2014

So? I mean it's probably nice to have, but the performance of Redis isn't important enough to be a factor in the scalability of this.

I just tested a binary search of 8 million random 100 byte DNA strings written in Go. 10,000 requests for autocompletions of random 5 byte prefixes yielded 78 million results in 38 milliseconds. And this is with the strings dotted around in memory after sorting them. Further optimizations are possible, but if they were hoping to prove some kind of performance benchmark with this, they have failed.

antirez · on April 19, 2014

Algorithmically this is nothing exceptional or that you can't reproduce easily, but this is even more true for like linked lists, that are the base of Redis List type. The point is to have this capabilities with a network interface, persistence, replication, and so forth.

artursapek · on April 19, 2014

I think it's just meant to demonstrate a novel use case for Redis. This will often be performant enough, in which case using out-of-the-box Redis will be favorable over writing and testing custom software.

However I'd also be interested to see that code, if it's open source. I've been spending lots of time with Go lately. :)

danbruc · on April 19, 2014

I first believed it uses the entire source but it only performs a prefix search on a per line basis. So we have a sorted list of 8M strings, around 600 MiB assuming 80 characters per line which is probably way to much, requiring 23 comparisons to find a match using binary search. That seems not really impressive to me.

antirez · on April 19, 2014

This is not impressive as any balanced tree will do the trick, but your example with binary search is not ideal, because binary search uses an array that would require an O(N) insert operation.

The point of Redis sorted sets is that inserting is cheap and as soon as you insert the query operations see the new data modified as needed without any slowdown, and this requires a tree-like structure for this exact use case (it is actually more complex than this and sorted sets are implemented with a skip list + hash table).

What is interesting of having this in Redis is that you have the same operation into a networked fast in-memory DB, so your clients can mount something requiring range queries now.

For instance you can use a sorted set where you increment the score of every search query you see, and another sorted set where the N-top items are inserted to autocomplete searches using the new ZRANGEBYLEX command. However because you have multiple keys and memory-wise this is pretty efficient, you can segment your users in different categories and do this in a per-category basis so that different categories will have different autocompletion dictionaries, that may better fit what they are looking for.

corford · on April 20, 2014

Nice to have but I'd stick with Solr/ES/Lucene for any serious string searching/autocomplete work.

dave1010uk · on April 19, 2014

What's the performance of a MySQL LIKE query on a similar sized table (with an index)?

gfv · on April 19, 2014

It's not really possible to compare LIKE to ZRANGEBYLEX. LIKE in a general case is an infix search in all text in a column, while ZRANGEBYLEX is a special case of search of all values within a given range. ZRANGEBYLEX uses a lexicographic comparison, so one can only search for strings with a prefix between certain values.

mappu · on April 19, 2014

MySQL can search by prefix by using only a trailing wildcard (LIKE "string%"), and then it will use an index if one is available.