Show HN: AskHN

setgree · on Feb 22, 2023

As is often true of GPT responses, there's some nonsense interspersed here, e.g. the claim that R has "a more mature package universe" than Python.

I think this is false, but if you're reading quickly, it sounds cogent enough. As Sarah Constantin observed about GPT2 [0]:

> if you skim text, you miss obvious absurdities. The point is OpenAI HAS achieved the ability to pass the Turing test against humans on autopilot...The mental motion of “I didn’t really parse that paragraph, but sure, whatever, I’ll take the author’s word for it” is, in my introspective experience, absolutely identical to “I didn’t really parse that paragraph because it was bot-generated and didn’t make any sense so I couldn’t possibly have parsed it”, except that in the first case, I assume that the error lies with me rather than the text. This is not a safe assumption in a post-GPT2 world. Instead of “default to humility” (assume that when you don’t understand a passage, the passage is true and you’re just missing something) the ideal mental action in a world full of bots is “default to null” (if you don’t understand a passage, assume you’re in the same epistemic state as if you’d never read it at all.)

[0] https://www.skynettoday.com/editorials/humans-not-concentrat...

leereeves · on Feb 22, 2023

> there's some nonsense interspersed here, e.g. the claim that R has "a more mature package universe" than Python.

As a programmer, I find R hard to use and not very well designed, so I can see why you'd call that nonsense.

But when I was a math student, I found that in some ways R does have "a more mature package universe". There were many math algorithms that I could find packages for in R and not in Python, even as a mere grad student.

setgree · on Feb 22, 2023

Absolutely, for statistics and visualization I think R and its packages are (sometimes) superior. But GPT responses don't generally offer those kinds of nuances; the claim is that the packages are "more mature," period. And it's for good reason that the _most_ mature Python packages, e.g. numpy and pandas, are used by data scientists in production pretty much everywhere.

ux-app · on Feb 22, 2023

amazingly, your comment will eventually be added to the CHatGPT corpus and at some point down the line may be used to add the nuance that's currently lacking :)

LtdJorge · on Feb 23, 2023

Assuming it's not a GPT response

2020aj · on Feb 23, 2023

Regarding numpy/pandas: What's the reason outside of them being _in Python_?

shanusmagnus · on Feb 22, 2023

I wonder if the "default to humility" heuristic probably does more harm than good on net, because the people who heed it probably shouldn't, and the ones who should won't.

nine_k · on Feb 23, 2023

Default to humility. Do not assume you're so smart that you can skim the text and understand it correctly. Read every word, don't assume that the author is so predictable that you can guess correctly.

Why, does not sound too arrogant to me.

q1w2 · on Feb 23, 2023

I think it's important to remember that Humans who are not-too-smart can also sound coherent, yet also babble complete nonsense.

My experience with ChatGPT thus far is that it is as intelligent as a very broadly read person who just doesn't reeeally get the complex or nuanced aspects of the content it reads - much like many real Humans.

setgree · on Feb 23, 2023

Robin Hanson makes this point in better babblers http://www.overcomingbias.com/2017/03/better-babblers.html

"After eighteen years of being a professor, I’ve graded many student essays. And while I usually try to teach a deep structure of concepts, what the median student actually learns seems to mostly be a set of low order correlations. They know what words to use, which words tend to go together, which combinations tend to have positive associations, and so on. But if you ask an exam question where the deep structure answer differs from answer you’d guess looking at low order correlations, most students usually give the wrong answer."

meken · on Feb 23, 2023

Reminds me how when people get criticized on Twitter now, they just assume it’s a bot

soheil · on Feb 22, 2023

[flagged]

seanhunter · on Feb 23, 2023

Giving credit to someone for the quotation is called a citation, and it's encouraged.

gnulinux · on Feb 23, 2023

A simple Google search yields that she's likely an expert and her opinion can be used as a "citation" in one's comment:

> Brief bio: I started out studying math (Princeton AB ‘10, Yale PhD ‘15, focusing on applied harmonic analysis) and then spent some time in the world of data science and machine learning (Palantir, Recursion Pharmaceuticals, Starsky Robotics.)

from: https://srconstantin.github.io/about/

setgree · on Feb 23, 2023

Her essay is from 2019, and I quoted it mainly to say that I am not making an original point, this has been a known problem with LLMs for a while (and I presume it will continue to be).

wisemang · on Feb 23, 2023

Or, maybe they want to give credit where it’s due and not plagiarize?

have_faith · on Feb 22, 2023

It seems to write in the generic "style" of GPT, instead of in the style I would recognise as a HN poster. Is that because of something baked into how the training process works? It lacks a sort of casualness or air of superiority ;)

sebzim4500 · on Feb 22, 2023

There was no training process, this is just running GPT with relevant HN comments as part of the prompt.

If he wanted it to replicate that classic HN feel he would either have to extend the prompt with additional examples or, better yet, use finetuning.

I guess he could also just randomly sprinkle in some terms like 'stochastic parrot' and find a way to shoehorn Tesla FSD into every conversation about AI.

btbuildem · on Feb 22, 2023

> “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.

First sentence of the first paragraph on OP's page

EDIT: it's a bit misleading, further down they describe what looks like a semantic-search approach

agolio · on Feb 22, 2023

Scroll a bit further down and you will see

> 7. Put top matching content into a prompt and ask GPT-3 to summarize

> 8. Return summary along with direct links to comments back to Discord user

btbuildem · on Feb 22, 2023

Ah got it. Perhaps they should edit the intro then, it's misleading.

stnmtn · on Feb 22, 2023

I agree, that language could be very improved. This is not a GPT-like LLM whose training corpus is HN comments, which I found to be an extremely interesting idea. Instead, it looks like it's finds relevant HN threads and tells GPT-3 (the existing model) to summarize them.

To be clear, I think this is still very cool, just misleading.

agolio · on Feb 22, 2023

Soon we will see language style transfer vectors, akin to the image style transfer at the peak of the ML craze 5-10 years ago -- so you will be able to take a HN snark vector and apply it to regular text, you heard it here first ;)

OkGoDoIt · on Feb 24, 2023

Joking aside, that does seem like it would be very useful. Kind of reminds me of the analogies that were common in initial semantic vector research. The whole “king - man + woman = queen” thing. Presumably that sort of vector arithmetic is still valid on these new LLM embeddings? Although it still would only be finding the closest vector embedding in your dataset, it wouldn’t be generating text guided by the target embedding vector. I wonder if that would be possible somehow?

efreak · on Feb 26, 2023

Hmm. If you're willing to be stuck in time at 2016, there's https://zenodo.org/record/45901

Build a model off of that?

rpastuszak · on Feb 22, 2023

Last year (pre the chatGPT bonanza) I was using GPT-3 to generate some content about attribution bias and the responses got much spicier once the prompt started including the typical HN poster lingo, like "10x developer":

https://sonnet.io/posts/emotive-conjugation/#:~:text=I%27m%2...

My conclusion was that you can use LLMs to automate and scale attribution bias.

We did it guys!

britzkopf · on Feb 22, 2023

To truly capture the HN experience, the user should provide a parameter for the number of "well actually"'s they want to receive. So initial response should demonstrate clear expertise and make a great concise point in response to question, and then start the cascade of silly nitpicking.

bradwood · on Feb 22, 2023

I think you'll find "I think you'll find" trumps "well actually".

;)

vidarh · on Feb 22, 2023

I wish the results were reversed, so I could "well actually" your comment, but 'site:news.ycombinator.com "well actually"' gives ca. 4k results in Google and 'site:news.ycombinator.com "I think you'll find"' gives close to 17k results, so you appear to be right.

actually_a_dog · on Feb 22, 2023

Well, "it turns out that" beats both, with about 26k results ;)

Jimmc414 · on Feb 22, 2023

site:news.ycombinator.com "in my experience" 120K results

genericone · on Feb 22, 2023

IANAL: unfortunately only 10.6k results, thought I had a winner for a second.

ysavir · on Feb 22, 2023

I am mildly disappointed that none of the phrase pitches in this thread were phrased with the given pitch.

clark-kent · on Feb 22, 2023

> ii. Compute embeddings and similarity and choose top K comments closest to question

> iii. Put top matching comments into a prompt and ask GPT-3 to answer the question using the context

It depends on the Prompt used to ask GPT the question. A prompt that instructs GPT to write like a HN poster should fix that.

cookie_monsta · on Feb 22, 2023

There also needs to be at least one question mark at the end of a statement?

reacharavindh · on Feb 22, 2023

Now that you say it, it will train itself for it while it learns from your comments ;-)

georgelyon · on Feb 22, 2023

Am I correct in understanding that this doesn't actually generate answers based on HN, but instead finds semantically-near comments and sends them verbatim to GPT to summarize? Seems like a good enough hack, though I'd love a detailed writeup of how to actually specialize an existing LLM with additional training data (like HN).

ilaksh · on Feb 22, 2023

Technically it does give a specific answer to the question, but it is based on the semantically similar comments (and the question).

The thing people don't realize is that right now there is a very large gap between the capabilities of a few models including OpenAI's most recent ones, and most of the other LLMs. So there are several options for actually training or fine-tuning with open models, but actually none of them have the language understanding and generation capabilities at the level of those new OpenAI models.

As far as I know.

jerpint · on Feb 22, 2023

The summary itself is still generated, but has all the context to do summarization in the prompt.

It's very difficult to otherwise finetune existing LLMs. GPT itself is closed-sourced, and doesn't allow for finetuning (except via an opaque API and with limited amounts of data). Other open models are either very difficult to load in memory and/or simply not as expressive as GPT

serjester · on Feb 22, 2023

Agreed, I think the better approach is to do some custom tuning but that becomes cost prohibitive very quickly. Not really much different than Algolia with a minor GPT-3 integration but neat project regardless.

redox99 · on Feb 22, 2023

You can literally finetune these OpenAI models using their API. In this case it probably wasn't done because the author found it too much work and/or too expensive.

_2uwr · on Feb 22, 2023

>the author found it too much work and/or too expensive.

Military budgets and manpower would make sock puppets interesting. https://www.theguardian.com/technology/2011/mar/17/us-spy-op...

Does anyone find it curious Edward Snowden is mute on the military's use of this technology?

How many people would be surprised to learn they are communicating with a program on websites like this?

flappysolutions · on Feb 22, 2023

I'm quite pleasantly surprised by this response:

Question: what are the biggest fallacies of people that post on hackernews?

Answer: One of the biggest fallacies of people that post on Hacker News is the nirvana fallacy, which is when someone jumps from "thing x is imperfect" to "thing x is worthless" without considering any of the benefits. People also tend to be familiar with certain technologies and will discount the flaws of existing technology vs. the flaws of new technology. Another fallacy is the assumption that there is one best tool for doing things - the one that supports the largest amount of use cases. Additionally, people may use appeals to authority, invent their own solutions rather than use existing ones, and assume that the approaches they first learned are most correct. Finally, people may assume open source software is a gift and that dependencies don't need to be tested.

autoexec · on Feb 23, 2023

> People also tend to be familiar with certain technologies and will discount the flaws of existing technology vs. the flaws of new technology.

Is that really a problem? It's only natural to get accustomed to old problems caused by old tech (where workarounds probably already exist or limitations understood and accepted) while it makes sense to hold new tech to a higher standard since it has the benefit of learning from past mistakes and should avoid introducing obvious problems for people in the future to deal with. Our future selves will thank us for being more critical of flawed new tech than we are of old tech.

ducktective · on Feb 22, 2023

It kinda enumerated all possible characteristics of people and their approaches to technical issues, not something unique to HN.

LeoPanthera · on Feb 22, 2023

I'm a little surprised that Hacker News comments weren't already in the GPT-3 training set. I just assumed that OpenAI had vacuumed up most of the web already.

retube · on Feb 22, 2023

I am guessing they already were? But this is 100% pure, concentrated HN not contaminated with nonsense from the rest of the web :)

bityard · on Feb 22, 2023

If it's really trained exclusively off of HN comments, I expect most of the bot's responses will evade the actual question but spend several paragraphs debating the factual specifics of every possible related tangential point, followed by an thinly-veiled insult questioning the user's true motivations.

heleninboodler · on Feb 22, 2023

In no way does a typical HN comment debate every possible related tangential point. Do we expect a modicum of intellectual rigor? Yes. But to say every tangent is followed and scrutinized is simply factually untrue.

And several paragraphs? I challenge you to show even a large minority of argumentative responses that veer into "several" paragraphs. You characterize this as "most of the ... responses" but I think that's unfair.

One wonders why you'd resort to such hyperbole unless you were deliberately attempting to undermine the value of the site.

GreenWatermelon · on Feb 22, 2023

This is my favorite type of humour.

Aromasin · on Feb 22, 2023

If you're not arguing over the semantics, rather than OP's clear-enough intent, are you really on HN?

jb1991 · on Feb 22, 2023

That had me laughing! Case in point, from a few days ago: https://news.ycombinator.com/item?id=34855372

rocho · on Feb 23, 2023

It's not trained at all. The bot finds relevant comments and then uses OpenAI's API to summarize them.

MuffinFlavored · on Feb 22, 2023

Is it exclusively HN comments and nothing else? How does a model like that know how to speak English (noun/verb and all that) if you are starting from scratch and feeding it nothing but HN comments?

neoromantique · on Feb 22, 2023

I'm sorry to be THAT GUY, but it is addressed in the article :)

>GPT embeddings

To index these stories, I loaded up to 2000 tokens worth of comment text (ordered by score, max 2000 characters per comment) and the title of the article for each story and sent them to OpenAI's embedding endpoint, using the standard text-embedding-ada-002 model, this endpoint accepts bulk uploads and is fast but all 160k+ documents still took over two hours to create embeddings. Total cost for this part was around $70.

gorbypark · on Feb 22, 2023

In a nut shell, this is using openai’s api to generate embeddings for top comments on hn, then also generating an embedding for the search term. It then can find the closest related comments for the given question by comparing the embeddings and then send the actual text to GPT3 to summarize. It’s a pretty clever way to do it.

nkozyra · on Feb 22, 2023

> How does a model like that know how to speak English

Mimicry.

nkozyra · on Feb 22, 2023

I have to assume that targeted/curated LLM training sets will have a tendency to be less accurate than very general, just by the very nature of how they work.

(edited for clarity)

andai · on Feb 22, 2023

I know it's not quite analogous, but I fine-tuned GPT-3 on a small (200 examples) data set and it performed extremely poorly compared to the untrained version.

This surprised me, I thought it wouldn't do much better, but I wasn't expecting that specializing it on my target data would reduce performance! I had fewer examples than the minimum OpenAI recommends, so maybe it was a case of overfitting or something like that.

jerpint · on Feb 22, 2023

Nice! We built something very similar recently, it is more like "Ask your documentation" but very similar implementations otherwise.

See a demo on the huggingface transformers documentation: https://huggingface.co/spaces/jerpint/buster

code: https://github.com/jerpint/buster

freeqaz · on Feb 22, 2023

Starred! We've been looking to build something similar so I appreciate you sharing this here.

The only other project that I've seen that's doing something close to this is this one: https://github.com/getbuff/Buff

It's a bit more similar to the OPs bot (it's a Discord bit).

Cool to see momentum in this space!

jn2clark · on Feb 22, 2023

As did we! It works very well. Article: https://github.com/marqo-ai/marqo/blob/mainline/examples/GPT... and code:https://github.com/marqo-ai/marqo/tree/mainline/examples/GPT...

boywitharupee · on Feb 22, 2023

For those who are wondering,

HN data is indexed with embeddings for semantic search. When queried, it finds closest article, top comments and summarizes with GPT-3.

GPT-3 serves as a rendering tool for compressed comments.

btbuildem · on Feb 22, 2023

My own experiments made me think that the impact of finetuning is comparable to that of a molecule in a drop in a bucket.

> “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.

I'm assuming you used the openai fine-tuning pathway to make a custom model?

Have you tested the responses on vanilla GPT3 vs your custom model?

I'd be curious to see the comparison.

clark-kent · on Feb 22, 2023

From the article, they did not use fine-tuning. This is semantic search + GPT-3 to provide human-like answers.

btbuildem · on Feb 22, 2023

Thanks! I missed that part.

The semantic search approach seems to focus the answers better than fine-tuning; at the cost of preloading the prompt with a lot of tokens, but with the benefit of a more constrained response.

danuker · on Feb 22, 2023

Yeah, to me it looks like the learning rate was way too low to make a difference.

I don't see any of the sublime and succinct snark.

notahacker · on Feb 22, 2023

Yeah. Also full of GPT-3isms like "ultimately the choice ... comes down to the specific project and its ... requirements" and not nearly contrarian enough

A bot focused on the output of HNers would insist on providing arguments against going through Google's interview process in the first place and suggestions that the correct answer to "Python or R" should be Haskell or Julia and would never suggest prioritising emotional vulnerability or being a happy person!

danuker · on Feb 22, 2023

Thank you for the laffs =)

SilverBirch · on Feb 22, 2023

This might be a dumb question, but is this based on the collective wisdom of HN. Because I would say that the collective wisdom is just as much in the interaction of the comments and the ranking of those comments as it is in the comments themselves. If you just injest all the comments wholesale, aren't you rather getting the average wisdom of HN?

inportb · on Feb 22, 2023

I believe it's always going to be an average. The more interesting question is how is the average weighted?

fleddr · on Feb 22, 2023

Let's admit that HN's culture is that many of us are confidentially wrong, which we cover up with impressive technical jargon. As such, any wrong answer in this AI is in fact correct.

Sai_ · on Feb 23, 2023

> confidentially wrong, which we cover up with impressive technical jargon

I get the feeling this comment is self-referential/self-parodying.

fleddr · on Feb 23, 2023

Well spotted ;)

MikeTheRocker · on Feb 22, 2023

I love this! I used to append "reddit" to my Google search queries to get best results, but the quality of dialog over there has really dropped in recent years. These days I've switched to appending "hackernews", but this is even better.

leobg · on Feb 22, 2023

Same. I have “site:news.ycombinator.com” as a keyboard shortcut on my phone. Use it all the time.

monkeydust · on Feb 22, 2023

Nice work! Been playing with Langchain and was not aware of patterns.app.

This whole space is moving so fast its hard to keep up for someone whos immediate day job doesn't revolve around this space. Congrats.

egypturnash · on Feb 22, 2023

Is there a way to opt out of one's comments being used for this?

anaganisk · on Feb 22, 2023

Nah, it's no big a deal, its not like cambridge analytica will happen again. They're just using your data to train AI. Who knows may be based on the way you comment, you may get suggestions on which medication you need, or if it's time for the Redbull/starbucks coffee. Nah, all is good. Nothing bad will happen in allowing companies to scrape comments and build models. They're very ethical. In fact, people here are suddenly not so concerned that the model is not open. There is no oversight on how data is being used They are just proud to get answers from a text generator.

olivierduval · on Feb 22, 2023

The BIG DEAL is not THAT specific instance but the fact that the ML crowd think it's OK to take everything without even asking permission

autoexec · on Feb 23, 2023

> The BIG DEAL is...the fact that the ML crowd think it's OK to take everything without even asking permission

Everything they take was freely given. Thrown into the void. Screamed into the wind. It's weird that people are perfectly fine if someone happens to read their words (at all) and fine if some of those who do read them manage to find something in them that is in any way helpful or useful, but the moment they think someone else might make money as a result of something gained from exposure to those same words it's somehow offensive and everyone starts demanding a cut of (usually non-existent) profit.

The "ML" crowd has just as much a right to read and learn from the words I enter on social media platforms as anyone else. I'm not charging any kind of fee for the words of debatable wisdom, fact checking, or shitposting I "contribute". I didn't ask permission before replying to your comment. Why should anyone feel like they should ask for permission from me to read it? What exactly is "taken" from me beyond the time I voluntarily spent participating in online discourse?

anaganisk · on Feb 22, 2023

I think I should've put an /s at the end. Its kind of strange that I see constant discussions here and people harrassing small apps/libraries about how their error collection is not OPT-IN. The whole audacity debacle. But data collection for training ML models is perfectly fine because we sure do know the companies who fund the research, how they will get an ROI.

Minor49er · on Feb 22, 2023

Just post a healthy amount of random nonsense along with any of your actual posts to dilute the effects

Banana Sebastian housewares fly swimmingly under terrestrial Zruodroru'th Memphis Steve Jobs archipelagos

triyambakam · on Feb 22, 2023

> Banana Sebastian housewares fly swimmingly under terrestrial Zruodroru'th Memphis Steve Jobs archipelagos

It's actually more likely to require a bathtub to increase the volume of the reticulated lorries, so I really don't think a farmer's market is the ideal place.

hombre_fatal · on Feb 22, 2023

Yes, don’t post on online forums.

samstave · on Feb 22, 2023

Thats how I decided to opt-out of reddit after 16 years.

jdthedisciple · on Feb 22, 2023

Why would you want to? Genuinely wondering.

I for one am oh so proud that my valuable ramblings contributed to this majestic machinery.

olivierduval · on Feb 22, 2023

I agree: when I signed in, I never agreed to let anybody use what I write to do anything they want ! I only agreed to let everybody read, understand, interact with what I wrote

Actually, it makes me feel as bad as knowing that CAPTCHA were used to train image recognition models...

I think it could be a good time to reconsider the question of the consent. I may agree that my words are used to train some IA... but 1) I must be asked (kindly) first and 2) it won't be free!!! (it may be paid to me or the service provider like HN... but it's NOT unpaid work ;-) )

andai · on Feb 22, 2023

If you're willing to pay for the retraining? ;)

PetrBrzyBrzek · on Feb 22, 2023

Hi, thanks for the interesting article. I have a question about Pinecone. What is the cost of storing all these vectors?

kerogerokero · on Feb 23, 2023

Anyone here know how to generate subtitles automatically using AI when a video is playing on the web?

Was planning to see how I can build something like these sites but without the need to regularly update the subtitles catalog: https://subscene.be https://subtitlecat.com https://subtitletrans.com https://my-subs.co

la64710 · on Feb 22, 2023

Is there any LLM model that can be self hosted and fed a corpus of data to ingest for question answering? The part I find difficult is how to feed (not train) the open LLM models with entire dataset not available to public?

bayan1234 · on Feb 22, 2023

The hack to solve this is to embed each paragraph in your large corpus. Find paragraphs most similar to the user query using embeddings. Put the paragraphs and the raw user query into a prompt template. Send the final generated prompt to gpt3.

This actually works surprisingly well.

Check out the OpenAI cookbook for examples.

renewiltord · on Feb 22, 2023

ChatGPT and friends always talk like those Microsoft and Apple forum responders with 100k reputation.

I see that you are asking about "How to get a job at Google". I will help you with "How to get a job at Google". In order to solve the problem of "How to get a job at Google" please follow the following steps first:

- rewrite your resume in Google Docs

- reinstall Chrome

- apply to the job

Let me know if I can help further with "How to get a job at Google". I like using it, but I have to tune my prompts to make sure that they don't bullshit me before getting to the point.

leobg · on Feb 22, 2023

I like the project. Had been wanting to do this myself for a long time, because HN has become the first place I go to nowadays for answers, and I value the intelligence and experience distilled in the comments here.

I do not like that it seems to be effectively an ad.

> Embedding every single one of the 6.5 eligible comments was prohibitively time-consuming and expensive (12 hours and ~$2,000).

Does anybody understand what he’s talking about here? Assuming 6.5 million comments and an average token length of 70 we'd be looking at $180 ($0.0004 / 1K tokens).

motohagiography · on Feb 22, 2023

Nice. I just sort of assumed early on my comments were training some future AI, and I hope that in some small way I have been able to moderate some of its stupider urges.

A version where you can turn knobs of flavored contributors would be pretty funny. I know my comment style is easily identifiable and reproducable, and it encodes a certain type of logical conjugation, albeit biased with some principles and trigger topics, and I think there is enough material on HN that there may be such a thing as a distinct, motohagiographic lens. :)

badrabbit · on Feb 23, 2023

Some day I will sue people like OP (if they're monetizing it) and OpenAI for monetizing my public posts. You can use, reuse and alter public speech but when you earn ad dollars...yeah part of that is mine if your model used my public content. I probably won't actually sue but someone will.

I am not a lawyer but there has to be a jurisdiction where I can establish standing at least.

gnicholas · on Feb 22, 2023

Love that it includes sources — this makes it much more valuable because you can tell if it's giving useful information or just blowing smoke.

clark-kent · on Feb 22, 2023

> 4. Index the embeddings in a database

If Op is reading. I'm curious about the database you are using to store the embeddings. Pinecone, Weaviate ...?

gk1 · on Feb 22, 2023

From the article:

> The embeddings were then indexed with Pinecone.

DanielBMarkham · on Feb 22, 2023

Related question: I've written probably a million words over my lifetime.

Is there an easy way to load up GPT with my thoughts to have it be a fake me?

mittermayr · on Feb 22, 2023

This I think would be a great little SaaS idea to make some money. I keep seeing more and more people askingn how they can transform their data into an interactive archive that responds as chat, or with voice.

ed_mercer · on Feb 23, 2023

Sort of. Look into gpt-index/Langchain

einpoklum · on Feb 22, 2023

> The methodology I used here is a generic, scalable solution for distilling a knowledge corpus into an embodied intelligence

The methodology used here is a generic solution for distilling a non-generic corpus of utterances of , into a generic platitude machine.

osigurdson · on Feb 22, 2023

I have an experiment that uses the embeddings to visualize clusterings of HN comments (using tsne). Not super useful but interesting to view the comments in 3D and seeing how similar ones cluster together into mostly relevant themes.

ada1981 · on Feb 22, 2023

Hmm. I thought perhaps he was going to take the questions from askHN and the top upvoted comments and fine tuning a model with that as the prompt / reply pair.

Curious how that would differ; but would be an expensive endevour.

jaequery · on Feb 22, 2023

Seeing a ton of projects utilizing chatgpt nowadays. Are the project owners basically paying the API costs out of pocket ? Think it would add up pretty quick especially if front page on HN.

wastu · on Feb 25, 2023

Instead of being summarize tool, this bot is really useful if you want to search related hn post based on abstract thinking imo.

underlines · on Feb 23, 2023

Just to be sure: This is NOT a finetuned GTP model, but rather standard GPT-3 API, used to summarize search results of a HN Comments DB, based on user input. Right?

tosemlep · on Feb 22, 2023

Did you also ingest dead comments to the corpus?

I would very much like to see the ghost of Terry pop up from time to time, to offer his wisdom and unique style of response.

MuffinFlavored · on Feb 22, 2023

> I trained on a corpus of over 6.5 million Hacker News comments

How long did it take to scrape them and train the "corpus" on this content?

tta · on Feb 22, 2023

https://console.cloud.google.com/marketplace/details/y-combi...

SalimoS · on Feb 22, 2023

That was mentioned in the article In the « Ingesting and filtering HN corpus » … 30min …

adversaryIdiot · on Feb 22, 2023

I Didn't know the api supported downloading all of its database. Are you the reason HN has sporadic downtime lately? ;)

Existenceblinks · on Feb 24, 2023

Ask HN: here is my idea, can I build this in a weekend

AI: of course .. here is your bash script (220 lines long)

bilekas · on Feb 22, 2023

"He only went and did it... " !

dalmo3 · on Feb 22, 2023

This is nice! The official algolia search is useless.

Otoh, did I miss something or is it only on discord?

cactusplant7374 · on Feb 22, 2023

I really like Algolia. I usually use it to see if a particular link has been submitted. Other times I use it to find relevant comments or posts.

smsm42 · on Feb 22, 2023

How to get a job at Google? Oh, that's easy, just get a PhD.

Thanks bottie, very use, much helpful.

helsontaveras18 · on Feb 22, 2023

Now that we have this bot to answer questions for us, I think we can all go home!

AndrewKemendo · on Feb 22, 2023

First thing I saw is my answers to someones question.

Can you cut me a distro of myself?

hbn · on Feb 22, 2023

Amazing, an AI that is incapable of picking up on jokes or sarcasm!

m3kw9 · on Feb 22, 2023

I thought chatgpt may already used hacker news (Reddit) to train?

sorokod · on Feb 23, 2023

the collective wisdom of the HN community

Made me smile

tomr75 · on Feb 22, 2023

could you do this for medical journal articles?

all2 · on Feb 22, 2023

You'd probably need to prepend a prompt that told the bot how to analyze experiment design. Maybe have it read a book or 10 on experiment design. Also a few books on social networks, financial motivations and other human factors in science. Then let it take a look at journal articles and their metadata. In short, you need a way to vet for quality.

blueicelake121 · on Feb 22, 2023

[flagged]

gus_massa · on Feb 22, 2023

It looks interesting, but posting it on random threads of HN will make users flag your post and mods ban your account.

The post definitively needs more info! Who are you? How do you pick the kids? Are you the "teacher", an "organizer" or just someone enthusiastic that is related to the project? Programing language? Age of the kids? Have you done something similar before? Length of the course? Why do you need money?

Try to write a long post answering all that questions and perhaps a few more, but not too long. Make a new post, and then make a comment explaining you are the [teacher or whatever], and be ready reply to the comments in the thread.

Some official suggestions in https://news.ycombinator.com/newswelcome.html

pknerd · on Feb 22, 2023

Can anyone help me to guide some tutorials using GPT-3 model on a certain dataset. I am a Python programmer.