Hacker News new | past | comments | ask | show | jobs | submit login

Is it exclusively HN comments and nothing else? How does a model like that know how to speak English (noun/verb and all that) if you are starting from scratch and feeding it nothing but HN comments?



I'm sorry to be THAT GUY, but it is addressed in the article :)

>GPT embeddings

To index these stories, I loaded up to 2000 tokens worth of comment text (ordered by score, max 2000 characters per comment) and the title of the article for each story and sent them to OpenAI's embedding endpoint, using the standard text-embedding-ada-002 model, this endpoint accepts bulk uploads and is fast but all 160k+ documents still took over two hours to create embeddings. Total cost for this part was around $70.


In a nut shell, this is using openai’s api to generate embeddings for top comments on hn, then also generating an embedding for the search term. It then can find the closest related comments for the given question by comparing the embeddings and then send the actual text to GPT3 to summarize. It’s a pretty clever way to do it.


> How does a model like that know how to speak English

Mimicry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: