Relevancy - I'd guess it's difficult to measure, due to the large dataset, the large number of possible search terms, and large number of possible results.
This looks fantastic! I also implemented one from scratch[0], and I was doing something very similar at first. Then I found the Python hash() function returns different outputs for the same input when you restart the interpreter. I really like your implementation, it's functional and easy to understand!