Hacker News new | past | comments | ask | show | jobs | submit login

> That’s actually quite unproblematic, if you have the tsvector as its own column (not just as index).

Yes, it works, but it is slow because the tsvector is usually big enough to be stored in a TOAST table, and this produces a lot of random access reads.

This is why there is a project of storing additional information in the GIN (term positions) in order for the index to contain all necessary information for the ranking:

https://wiki.postgresql.org/images/2/25/Full-text_search_in_...




> usually big enough to be stored in a TOAST table

Ah, luckily, in my case, that can’t happen – each row’s message contains one IRC message, so at most 512 bytes. That also automatically ensures we’ll never run into TOAST issues.


Now I understand why you didn't suffer from the slow ranking issue. In my test case, text is longer and triggers the TOAST management code.


Yeah, if your vectors are in TOAST, you really have a huge issue with ranking. There’s no simple way to get around that, except with customized solutions like Lucene/Solr/ES




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: