There was no training process, this is just running GPT with relevant HN comment...

btbuildem · on Feb 22, 2023

> “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.

First sentence of the first paragraph on OP's page

EDIT: it's a bit misleading, further down they describe what looks like a semantic-search approach

agolio · on Feb 22, 2023

Scroll a bit further down and you will see

> 7. Put top matching content into a prompt and ask GPT-3 to summarize

> 8. Return summary along with direct links to comments back to Discord user

btbuildem · on Feb 22, 2023

Ah got it. Perhaps they should edit the intro then, it's misleading.

stnmtn · on Feb 22, 2023

I agree, that language could be very improved. This is not a GPT-like LLM whose training corpus is HN comments, which I found to be an extremely interesting idea. Instead, it looks like it's finds relevant HN threads and tells GPT-3 (the existing model) to summarize them.

To be clear, I think this is still very cool, just misleading.

agolio · on Feb 22, 2023

Soon we will see language style transfer vectors, akin to the image style transfer at the peak of the ML craze 5-10 years ago -- so you will be able to take a HN snark vector and apply it to regular text, you heard it here first ;)

OkGoDoIt · on Feb 24, 2023

Joking aside, that does seem like it would be very useful. Kind of reminds me of the analogies that were common in initial semantic vector research. The whole “king - man + woman = queen” thing. Presumably that sort of vector arithmetic is still valid on these new LLM embeddings? Although it still would only be finding the closest vector embedding in your dataset, it wouldn’t be generating text guided by the target embedding vector. I wonder if that would be possible somehow?

efreak · on Feb 26, 2023

Hmm. If you're willing to be stuck in time at 2016, there's https://zenodo.org/record/45901

Build a model off of that?

rpastuszak · on Feb 22, 2023

Last year (pre the chatGPT bonanza) I was using GPT-3 to generate some content about attribution bias and the responses got much spicier once the prompt started including the typical HN poster lingo, like "10x developer":

https://sonnet.io/posts/emotive-conjugation/#:~:text=I%27m%2...

My conclusion was that you can use LLMs to automate and scale attribution bias.

We did it guys!