We do use WebLLM and a hosted Weaviate database, but there are complaints about ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		_1 7 months ago \| parent \| context \| favorite \| on: Smollm3: Smol, multilingual, long-context reasoner... We do use WebLLM and a hosted Weaviate database, but there are complaints about speed (both retrieval and time to first token as the context will get big). The Gemma 3n "nesting doll" approach sounds like it could be useful .. but haven't found anyone specifically doing it to add domain specific knowledge.

janalsncm 7 months ago [–]

Typically retrieval is the fast part in my experience. Have you considered cheaper retrieval methods? Bm25 does pretty well on its own. And you can augment your dataset by precomputing relevant queries for each doc.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact