Hacker News new | past | comments | ask | show | jobs | submit login

I just gave it a whirl. Pretty neat, but definitely watch out for hallucinations. For instance, I asked it to compile a report on myself (vain, I know.) In this 500-word report (ok, I'm not that important, I guess), it made at least three errors.

It stated that I had 47,000 reputation points on Stack Overflow -- quite a surprise to me, given my minimal activity on Stack Overflow over the years. I popped over to the link it had cited (my profile on Stack Overflow) and it seems it confused my number of people reached (47k) with my reputation, a sadly paltry 525.

Then it cited an answer I gave on Stack Overflow on the topic of monkey-patching in PHP, using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else. Looks like I don't have much expertise, after all.

Finally, it found a gem of a quote from an interview I gave. Or wait, that was my brother! Confusingly, we founded a company together, and we were both mentioned in the same article, but he was the interviewee, not I.

I would say it's decent enough for a springboard, but you should definitely treat the output with caution and follow the links provided to make sure everything is accurate.






"Pretty neat, but definitely watch out for hallucinations."

We'd never hire someone who just makes stuff up (or at least keep them employed for long). Why are we okay with calling "AI" tools like this anything other than curious research projects?

Can't we just send LLMs back to the drawing board until they have some semblance of reliability?


> Why are we okay with calling "AI" tools like this anything other than curious research projects?

Because they are a way to launder liability while reducing costs to produce a service.

Look at the AI-based startups y-combinator has been funding. They match that description.


> We'd never hire someone who just makes stuff up (or at least keep them employed for long).

This is contrary to my experience.


Our president begs to differ! Or pretty much any elected official for that matter.

Not everything is politics. Already your president gets too much media spotlight.

> Can't we just send LLMs back to the drawing board until they have some semblance of reliability?

Well at this point they've certainly proven a net gain for everyone regardless of the occasional nonsense they spew.


No, from the research around it the findings are mixed. There is no consensus that it's net gain.

That is... debatable. You may be entirely inside the bubble, there.

Not sure if this was posted as humour, but I don't feel that way. In today's world, where I certainly would consider taking the blue pill, I'm having a blast with LLMs!

It has helped me learn stuff incredibly faster. Especially I find them useful for filling the gaps of knowledge and exploring new topics in my own way and language, without needing to wait an answer from a human (that could also be wrong).

Why does it feel, that "we are entirely inside the bubble" for you?


Are you sure it's helped you learn?

In the early days of ChatGPT where it seemed like this fun new thing, I used it to "learn" C. I don't remember anything it told me, and none of the answers it gave me were anything that I couldn't find elsewhere in different forms - heck I could have flipped open Kernighan & Ritchie to the right page and got the answer.

I had a conversation with an AI/Bitcoin enthusiast recently. Maybe that already tells you everything you need to know about this person, but to the hammer the point home, they made a claim to similar to you: "I learn much more and much better with AI". They also said they "fact check" things it "tells" them. Some moments later they told me "Bitcoin has its roots in Occupy Wall Street".

A simple web search tells you that Bitcoin is conceived a full 2 years before Occupy. How can they be related?

It's a simple error that can be fact checked simply. It's a pretty innocuous falsity in this particular case - but how many more falsehoods have they collected? How do those falsehoods influence them on a day-by-day basis?

How many falsehoods influence you?

A very well meaning activist posted a "comprehensive" list of all the programs that were to be halted by the grants and loans freezes last week. Some of the entries on the list weren't real, or not related to the freeze. They revealed they used ChatGPT to help compile the list and then went down one-by-one to verify each one.

With such meticulous attention to detail, incorrect information still filtered through.

Are you sure you are learning?


I guess the real learning happens outside the AI, here in real life. Does the code run? Sure, it's on my local and not in production, but I would've never have the patience to get "that new thing working" without AI as assistant.

Does the food taste good? Oops, there's a bit too much vegetables here, they are never gonna fit in this pan of mine. Not a big deal, next time I'll be wiser.

AI is like a hypothesis machine. You're gonna have to figure out if the output is true. Few years ago, just testing any machine's "intelligence" was pretty quickly done and machine failed miserably. Now, the accuracy is astounishing in comparison.

> How many falsehoods influence you?

That is a great question. The answer is definitely not zero. I try to live by with a hacker mentality and I'm an engineer by trade. I read news and comments, which I'm not sure is good for me. But you also need some compassion towards oneself. It's not like ripping everything open will lead to salvation. I believe the truth does set you free, eventually. But all in one's time...

Anyway, AI is a tool like any other. Someone will hammer their fingers with it. I just don't understand the hate. It's not like we're drinking any AI koolaids here. It's just like it was 30 years ago (in my personal journey), you had a keyboard and a machine, you asked it things and got gibberish. Now the conversation with it just started to get interesting. Peace.


When your bitcoiner friend told you something that's not true, that's a human who hallucinated, not an LLM.

Maybe we're already at AGI and just don't know it because we overestimate the capabilities of most humans.


The assertion is that they "learned" that Bitcoin came from Occupy from an AI.

If AI is teaching you, you are going to collect a thousand papercuts of lies.


>It has helped me learn stuff incredibly faster. Especially I find them useful for filling the gaps of knowledge and exploring new topics in my own way and language

and then you verify every single fact it tells you via traditional methods by confirming them in human-written documents, right?

Otherwise, how do you use the LLM for learning? If you don't know the answer to what you're asking, you can't tell if it's lying. It also can't tell if it's lying, so you can't ask it.

If you have to look up every fact it outputs after it does, using traditional methods, why not skip to just looking things up the old fashioned way and save time?

Occasionally an LLM helps me surface unknown keywords that make traditional searches easier, but they can't teach anything because they don't know anything. They can imagine things you might be able to learn from a real authority, but that's it. That can be useful! But it's not useful for learning alone.

And if you're not verifying literally everything an LLM tells you.. are you sure you're learning anything real?


I guess it all depends on the topic and levels of trust. How can I be certain that I have a brain? I just have to take something for granted, don't I? Of course I will "verify" the "important stuff", but what is important? How can I tell? Most of the time only thing I need is a pointer in the right direction. Wrong advice? I know when I get there I suppose.

I can remember numerous things I was told while growing up, that aren't actually true. Either by plain lies and rumours or because of the long list of our cognitive biases.

> If you have to look up every fact it outputs after it does, using traditional methods, why not skip to just looking things up the old fashioned way and save time?

What is the old fashioned way? I mean people learn "truths" these days from Tiktok and Youtube. Some of the stuff is actually very good, you just have to distill it based on the stuff I was being taught at school. Nonody has yet declared LLMs as a subtitute for schools, maybe they soon will, but neither "guarantees" us anything. We could as well be taught political agendas.

I could order a book about construction, but I wouldn't build a house without asking a "verified" expert. Some people build anyway and we get some catastrofic results.

Levels of trust, it's all games and play until it gets serious, like what to eat or doing something that involves life threatening physics. I take it as playing with a toy. Surely something great have come up from only a few piece of legos?

> And if you're not verifying literally everything an LLM tells you.. are you sure you're learning anything real?

I guess you shouldn't do it that way. But really, so far the topics I've rigorously explored with ChatGPT for example, have been better than your average journalism. What is real?


> What is the old fashioned way?

Looking in a resource written by someone with sufficient ethos that they can be considered trustworthy .

> What is real?

I'm not arguing ontology about systems that can't do arithmetic. you're not arguing in good faith at all


Saying you need to verify "literally everything" both overestimates the frequency of hallucinations and underestimates the amount of wrong found in human-written sources. e.g. the infamous case of Google's AI recommending Elmer's glue on pizza was literally a human-written suggestion first: https://www.reddit.com/r/Pizza/comments/1a19s0/my_cheese_sli...

The Gell-Mann amnesia effect applies to LLMs as well!

https://en.m.wikipedia.org/wiki/Gell-Mann_amnesia_effect


> without needing to wait an answer from a human (that could also be wrong).

The difference is you have some reassurances that the human is not wrong - their expertise and experience.

The problem with LLMs, as demonstrated by the top-level comment here, is that they constantly make stuff up. While you may think you're learning things quickly, how do you know you're learning them "correctly", for lack of a better word?

Until an LLM can say "I don't know", I really don't think people should be relying on them as a first-class method of learning.


You overestimate the importance of being correct

"Occasional nonsense" doesn't sound great, but would be tolerable.

Problem is - LLMs pull answers from their behind, just like a lazy student on the exam. "Halucinations" is the word people use to describe this.

Those are extremely hard to spot - unless you happen to know the right answer already, at which point - why ask? And those are everywhere.

One example - recently there was quite a discussion about llm being able to understand (and answer) base16 (aka "hex") encoding on the fly, so I went on to try base64, gzipped base64, zstd-compressed base64, etc...

To my surprise, LLM got most of those encoding/compressions right, decoded/uncompressed the question, and answered it flawlessly.

But with few encodings, LLM detected base64 correctly, got compression algorithm correctly, and then... instead of decompressing, made up a completely different payload, and proceeded to answer that. Without any hint of anything sinister going.

We really need LLMs to reliably calculate and express confidence. Otherwise they will remain mere toys.


Yeah, what you said represents a 'net gain' over not having any of that at all.

I think as these things get more integrated into customer service workflows - especially for things like insurance claims - there's gonna start being a lot more buyer's remorse on everyone's part.

We've tried for decades to turn people into reliable robots, now many companies are running to replace people robots with (maybe less reliable?) robot-robots. What could go wrong? What are the escalation paths going to be? Who's going to be watching them?


A net gain for everyone? Tell that to the artists its screwing over!

Why not just verify the output? It’s faster than generating the entire thing yourself. Why do you need perfection in a productivity tool?

At that point why not just... I dunno, do the research yourself?

Perhaps because the time to proofread/correct is less than to do it from scratch? That would still make it a valuable tool

How?

It's given you some information and now you have to seek out a source to verify that it's correct.

Finding information is hard work. It's why librarian is a valuable skilled profession. What you've done by suggesting that I should "verify" or "proofread" what a glorified, water-wasting Markov chain has given me now entails me looking up that information to verify that it's correct. That's...not quite doubling the work involved but it's adding an unnecessary step.

I could have searched for the source in the first instance. I could have gone to the library and asked for help.

We spent time coming up with a question ("prompt engineering"! hah!), we used up a bunch of electricity for an answer to be generated and now you...want me to search up that answer to find the source? Why did we do the first step?

People got undergraduate degrees - hell, even PhDs - before generative AI.

Look up the tweet from someone who said "Sometimes when coming up with a good prompt for ChatGPT, I sometimes come up with the answer myself without needing to submit".


Verifying information is an order of magnitude easier than compiling it or synthesizing it in the first place. Prompt engineering is an order of magnitude easier still. This is obvious to most people, but apparently it needs to be said.

An entire day of generating responses with ChatGPT uses less water and energy than your morning shower. You seem terribly concerned about signaling the virtues of abstaining from technology use on behalf of purported resource misuse, yet you're sitting at a computer typing away.

You're not a serious person, and you're wasting everyone's time. Please leave the internet and go play with rocks in a cave.


You made a new account just to post this; I'm flattered! Perhaps your normal account is tied to your professional identity?

Do take care.


Sometimes you don't need sources to verify something is correct, its something you can directly verify. To reduce it to the easiest version of this, I ask for code to do something, it writes me code, I run my unit test, it passes, my time is saved!

For other things, it depends, but if I'm asking it to do a survey I can look at its results and see if they fit what I'm looking for, check the sources it gives me, etc. People pay analysts/paralegals/assistants to do exactly this kind of work all the time expecting that they will need to check it over. I don't see how this is any different.

I don't think the library/electricity responses are serious but to move on to the point about degrees... people also got those degrees before calculators, before computers, before air travel, before video calls, before the internet, before electricity, yet all of those things assist in creating knowledge. I think its perfectly reasonable to look at these LLMs/chat assistants in the same light: as a tool that can augment human productivity in its own way.


I'm interested to hear more about how you can verify information without a source. What are you looking at when you search for the verification, exactly?

Some code or maths proofs can be self supporting with things like unit tests or proof checkers as an example

But is it?

> We'd never hire someone who just makes stuff up

We do all the time - of course we do, all the time.


You can use them for whatever you like, or not use them. Everyone has a different bar for when technology is useful. My dad doesn't think EVs are useful due to the long charge times, but there are others who find it fully acceptable.

This doesn’t make LLMs worthless, you just need to structure your processes around fallibility. Much like a well designed release pipeline is built with the expectation that devs will write bugs that shouldn’t ship.

3k a month vs ~500 dollars a month. That's all u need to know. Not saying its as good, but its all some managers care about

Yeah, I used to hire people, but then one of them made a mistake, now I'm done with them forever, they are useless. It is not I, who is directing the workers, who cannot create a process that is resistant to errors, it's definitely the fact that all people are worthless until they make no errors as there truly is no other way of doing things other than telling your intern to do a task then having them send it directly to the production line.

LLM are "great" in some use cases, "ok" in others, and "laughable" in more.

Some people might find $500 worth of value, in their specific use case, in those "great" and "ok" categories, where they get more value than "lies" out of it.

A few verifiable lies, vs hours of time, could be worth it for some people, with use cases outside of your perspective.


I disagree that this is a useful springboard. And I say that as an AI optimist.

A report full of factual errors that a careful intern wouldn't make is worse than useless (yes, yes, I've mentored interns).

If the hard part is the language, then do the research yourself, write an outline, and have the LLM turn it into complete sentences. That would at least be faster.

Here's the thing, though: If you do that, you're effectively proving that prose style is the low-value part of the work, and may be unnecessary. Which, as much as it pains to me say as a former English major, is largely true.


What's faster? Writing a 500 word report "from scratch" by researching the topic yourself, vs. having AI write it then having to fact check every answer and correct each piece manually?

This is why I don't use AI for anything that requires a "correct" answer. I use it to re-write paragraphs or sentences to improve readability etc, but I stop short of trusting any piece of info that comes out from AI.


> Then it cited an answer I gave on Stack Overflow [...] using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else

Artificial dementia...

Some parties are releasing products much earlier than the ability to ship well working products (I am not sure that their legal cover will be so solid), but database aided outputs should and could become a strong limit to that phenomenon of remembering badly. Very linearly, like humans: get an idea, then compare it to the data - it is due diligence and part of the verification process in reasoning. It is as if some moves outside linear pure product progress reasoning are swaying the RnD towards directions outside the primary concerns. It's a form of procrastination.


> Pretty neat, but definitely watch out for hallucinations.

That would be exactly my verdict of any product based on LLMs in the past few years.


Interesting!

I wonder if it’s carried over too much of that ‘helpful’ DNA from 4o’s RLHF. In that case, maybe asking for 500 words was the difficult part — it just didn’t have enough to say based on one SO post and one article, but the overall directives assume there is, and so the model is put into a place where it must publish..

Put another way, it seems this model faithfully replicates the incentives most academics have — publish a positive result, or get dinged. :)

Did it pick up your HN comments? Kadua claims that’s more than enough to roast me, … and it’s not wrong. It seems like there’s enough detail about you (or me) there to do a better job summarizing.


I didn't actually give it a goal of writing any particular length, but I do think that perhaps given my not-so-large online footprint, it may have felt "pressured" to generate content that simply isn't there.

It didn't pick up my HN comments, probably because my first and last name are not in my profile, though obviously that is my handle in a smooshed-together form.


This is very bearish for current AI. Seems like 99% reliability is still too small with compounding errors. But I wonder of this is inherently specific to longer context or if this just depends on how it’s trained. In theory longer context => more errors

Although I think people are the same, too big problem and you are getting lost unless taking it in bites, so seems like OpenAI implementation is just bad because o3 hallucination benchmark shouldn’t lead to such poor performance


Interesting

You might find it amusing to compare it to: https://hn-wrapped.kadoa.com/timabdulla

(Ref:https://news.ycombinator.com/item?id=42857604)


This is... very uncomfortable. An (expanded) AI summary of my HN and reddit usage would appear to be a pretty complete representation of my "online" identity/character. I remember when people would browse your entire comment history just to find something to discredit you on reddit, and that behavior was _heavily_ discouraged. Now, we can just run an AI model to follow you and sentence you to a hell of being permanently discredited online. Give it a bunch of accounts to rotate through, send some voting power behind it (reddit or hn), and just pick apart every value you hold. You could obliterate someone's will to discuss anything online. You could effectively silence all but the most stubborn, and those people you would probably drive insane.

It's a very interesting usecase though, filter through billions of comments and give everyone a score on which real life person they probably are. I wonder if say, Ted Cruz hides behind a username somewhere.


throwaway/anonymous.

not just for when discussion of the content not the personality behind it is important.


I put my profile in [0] and it's mostly silly; a few comments extracted and turned into jokes. No deep insights into me, and my "Top 3 Technologies" are hilariously wrong (I've never written a single line of TypeScript!)

[0]: https://hn-wrapped.kadoa.com/dlivingston


That.. seems to just take a few (three or four) random comments that received some attention and then extrapolate an entire profile based on (incorrectly) interpreting their contents?

https://hn-wrapped.kadoa.com/ComputerGuru


So, I still think this is a cool tool for search reasons, but otherwise the tendency to hallucinate makes it questionable as a researcher.

Hypothetically speaking, if the time you saved is now spent verifying the statements of your AI researcher, then did you really save any time at all?

If the answers aren't important enough to verify, then was it ever even important enough to actually research to begin with?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: