I just gave it a whirl. Pretty neat, but definitely watch out for hallucinations. For instance, I asked it to compile a report on myself (vain, I know.) In this 500-word report (ok, I'm not that important, I guess), it made at least three errors.
It stated that I had 47,000 reputation points on Stack Overflow -- quite a surprise to me, given my minimal activity on Stack Overflow over the years. I popped over to the link it had cited (my profile on Stack Overflow) and it seems it confused my number of people reached (47k) with my reputation, a sadly paltry 525.
Then it cited an answer I gave on Stack Overflow on the topic of monkey-patching in PHP, using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else. Looks like I don't have much expertise, after all.
Finally, it found a gem of a quote from an interview I gave. Or wait, that was my brother! Confusingly, we founded a company together, and we were both mentioned in the same article, but he was the interviewee, not I.
I would say it's decent enough for a springboard, but you should definitely treat the output with caution and follow the links provided to make sure everything is accurate.
"Pretty neat, but definitely watch out for hallucinations."
We'd never hire someone who just makes stuff up (or at least keep them employed for long). Why are we okay with calling "AI" tools like this anything other than curious research projects?
Can't we just send LLMs back to the drawing board until they have some semblance of reliability?
Not sure if this was posted as humour, but I don't feel that way. In today's world, where I certainly would consider taking the blue pill, I'm having a blast with LLMs!
It has helped me learn stuff incredibly faster. Especially I find them useful for filling the gaps of knowledge and exploring new topics in my own way and language, without needing to wait an answer from a human (that could also be wrong).
Why does it feel, that "we are entirely inside the bubble" for you?
In the early days of ChatGPT where it seemed like this fun new thing, I used it to "learn" C. I don't remember anything it told me, and none of the answers it gave me were anything that I couldn't find elsewhere in different forms - heck I could have flipped open Kernighan & Ritchie to the right page and got the answer.
I had a conversation with an AI/Bitcoin enthusiast recently. Maybe that already tells you everything you need to know about this person, but to the hammer the point home, they made a claim to similar to you: "I learn much more and much better with AI". They also said they "fact check" things it "tells" them. Some moments later they told me "Bitcoin has its roots in Occupy Wall Street".
A simple web search tells you that Bitcoin is conceived a full 2 years before Occupy. How can they be related?
It's a simple error that can be fact checked simply. It's a pretty innocuous falsity in this particular case - but how many more falsehoods have they collected? How do those falsehoods influence them on a day-by-day basis?
How many falsehoods influence you?
A very well meaning activist posted a "comprehensive" list of all the programs that were to be halted by the grants and loans freezes last week. Some of the entries on the list weren't real, or not related to the freeze. They revealed they used ChatGPT to help compile the list and then went down one-by-one to verify each one.
With such meticulous attention to detail, incorrect information still filtered through.
I guess the real learning happens outside the AI, here in real life. Does the code run? Sure, it's on my local and not in production, but I would've never have the patience to get "that new thing working" without AI as assistant.
Does the food taste good? Oops, there's a bit too much vegetables here, they are never gonna fit in this pan of mine. Not a big deal, next time I'll be wiser.
AI is like a hypothesis machine. You're gonna have to figure out if the output is true. Few years ago, just testing any machine's "intelligence" was pretty quickly done and machine failed miserably. Now, the accuracy is astounishing in comparison.
> How many falsehoods influence you?
That is a great question. The answer is definitely not zero. I try to live by with a hacker mentality and I'm an engineer by trade. I read news and comments, which I'm not sure is good for me. But you also need some compassion towards oneself. It's not like ripping everything open will lead to salvation. I believe the truth does set you free, eventually. But all in one's time...
Anyway, AI is a tool like any other. Someone will hammer their fingers with it. I just don't understand the hate. It's not like we're drinking any AI koolaids here. It's just like it was 30 years ago (in my personal journey), you had a keyboard and a machine, you asked it things and got gibberish. Now the conversation with it just started to get interesting. Peace.
>It has helped me learn stuff incredibly faster. Especially I find them useful for filling the gaps of knowledge and exploring new topics in my own way and language
and then you verify every single fact it tells you via traditional methods by confirming them in human-written documents, right?
Otherwise, how do you use the LLM for learning? If you don't know the answer to what you're asking, you can't tell if it's lying. It also can't tell if it's lying, so you can't ask it.
If you have to look up every fact it outputs after it does, using traditional methods, why not skip to just looking things up the old fashioned way and save time?
Occasionally an LLM helps me surface unknown keywords that make traditional searches easier, but they can't teach anything because they don't know anything. They can imagine things you might be able to learn from a real authority, but that's it. That can be useful! But it's not useful for learning alone.
And if you're not verifying literally everything an LLM tells you.. are you sure you're learning anything real?
I guess it all depends on the topic and levels of trust. How can I be certain that I have a brain? I just have to take something for granted, don't I? Of course I will "verify" the "important stuff", but what is important? How can I tell? Most of the time only thing I need is a pointer in the right direction. Wrong advice? I know when I get there I suppose.
I can remember numerous things I was told while growing up, that aren't actually true. Either by plain lies and rumours or because of the long list of our cognitive biases.
> If you have to look up every fact it outputs after it does, using traditional methods, why not skip to just looking things up the old fashioned way and save time?
What is the old fashioned way? I mean people learn "truths" these days from Tiktok and Youtube. Some of the stuff is actually very good, you just have to distill it based on the stuff I was being taught at school. Nonody has yet declared LLMs as a subtitute for schools, maybe they soon will, but neither "guarantees" us anything. We could as well be taught political agendas.
I could order a book about construction, but I wouldn't build a house without asking a "verified" expert. Some people build anyway and we get some catastrofic results.
Levels of trust, it's all games and play until it gets serious, like what to eat or doing something that involves life threatening physics. I take it as playing with a toy. Surely something great have come up from only a few piece of legos?
> And if you're not verifying literally everything an LLM tells you.. are you sure you're learning anything real?
I guess you shouldn't do it that way. But really, so far the topics I've rigorously explored with ChatGPT for example, have been better than your average journalism. What is real?
Saying you need to verify "literally everything" both overestimates the frequency of hallucinations and underestimates the amount of wrong found in human-written sources. e.g. the infamous case of Google's AI recommending Elmer's glue on pizza was literally a human-written suggestion first: https://www.reddit.com/r/Pizza/comments/1a19s0/my_cheese_sli...
> without needing to wait an answer from a human (that could also be wrong).
The difference is you have some reassurances that the human is not wrong - their expertise and experience.
The problem with LLMs, as demonstrated by the top-level comment here, is that they constantly make stuff up. While you may think you're learning things quickly, how do you know you're learning them "correctly", for lack of a better word?
Until an LLM can say "I don't know", I really don't think people should be relying on them as a first-class method of learning.
"Occasional nonsense" doesn't sound great, but would be tolerable.
Problem is - LLMs pull answers from their behind, just like a lazy student on the exam. "Halucinations" is the word people use to describe this.
Those are extremely hard to spot - unless you happen to know the right answer already, at which point - why ask? And those are everywhere.
One example - recently there was quite a discussion about llm being able to understand (and answer) base16 (aka "hex") encoding on the fly, so I went on to try base64, gzipped base64, zstd-compressed base64, etc...
To my surprise, LLM got most of those encoding/compressions right, decoded/uncompressed the question, and answered it flawlessly.
But with few encodings, LLM detected base64 correctly, got compression algorithm correctly, and then... instead of decompressing, made up a completely different payload, and proceeded to answer that. Without any hint of anything sinister going.
We really need LLMs to reliably calculate and express confidence. Otherwise they will remain mere toys.
I think as these things get more integrated into customer service workflows - especially for things like insurance claims - there's gonna start being a lot more buyer's remorse on everyone's part.
We've tried for decades to turn people into reliable robots, now many companies are running to replace people robots with (maybe less reliable?) robot-robots. What could go wrong? What are the escalation paths going to be? Who's going to be watching them?
It's given you some information and now you have to seek out a source to verify that it's correct.
Finding information is hard work. It's why librarian is a valuable skilled profession. What you've done by suggesting that I should "verify" or "proofread" what a glorified, water-wasting Markov chain has given me now entails me looking up that information to verify that it's correct. That's...not quite doubling the work involved but it's adding an unnecessary step.
I could have searched for the source in the first instance. I could have gone to the library and asked for help.
We spent time coming up with a question ("prompt engineering"! hah!), we used up a bunch of electricity for an answer to be generated and now you...want me to search up that answer to find the source? Why did we do the first step?
People got undergraduate degrees - hell, even PhDs - before generative AI.
Look up the tweet from someone who said "Sometimes when coming up with a good prompt for ChatGPT, I sometimes come up with the answer myself without needing to submit".
Verifying information is an order of magnitude easier than compiling it or synthesizing it in the first place. Prompt engineering is an order of magnitude easier still. This is obvious to most people, but apparently it needs to be said.
An entire day of generating responses with ChatGPT uses less water and energy than your morning shower. You seem terribly concerned about signaling the virtues of abstaining from technology use on behalf of purported resource misuse, yet you're sitting at a computer typing away.
You're not a serious person, and you're wasting everyone's time. Please leave the internet and go play with rocks in a cave.
Sometimes you don't need sources to verify something is correct, its something you can directly verify. To reduce it to the easiest version of this, I ask for code to do something, it writes me code, I run my unit test, it passes, my time is saved!
For other things, it depends, but if I'm asking it to do a survey I can look at its results and see if they fit what I'm looking for, check the sources it gives me, etc. People pay analysts/paralegals/assistants to do exactly this kind of work all the time expecting that they will need to check it over. I don't see how this is any different.
I don't think the library/electricity responses are serious but to move on to the point about degrees... people also got those degrees before calculators, before computers, before air travel, before video calls, before the internet, before electricity, yet all of those things assist in creating knowledge. I think its perfectly reasonable to look at these LLMs/chat assistants in the same light: as a tool that can augment human productivity in its own way.
I'm interested to hear more about how you can verify information without a source. What are you looking at when you search for the verification, exactly?
You can use them for whatever you like, or not use them. Everyone has a different bar for when technology is useful. My dad doesn't think EVs are useful due to the long charge times, but there are others who find it fully acceptable.
This doesn’t make LLMs worthless, you just need to structure your processes around fallibility. Much like a well designed release pipeline is built with the expectation that devs will write bugs that shouldn’t ship.
Yeah, I used to hire people, but then one of them made a mistake, now I'm done with them forever, they are useless. It is not I, who is directing the workers, who cannot create a process that is resistant to errors, it's definitely the fact that all people are worthless until they make no errors as there truly is no other way of doing things other than telling your intern to do a task then having them send it directly to the production line.
LLM are "great" in some use cases, "ok" in others, and "laughable" in more.
Some people might find $500 worth of value, in their specific use case, in those "great" and "ok" categories, where they get more value than "lies" out of it.
A few verifiable lies, vs hours of time, could be worth it for some people, with use cases outside of your perspective.
I disagree that this is a useful springboard. And I say that as an AI optimist.
A report full of factual errors that a careful intern wouldn't make is worse than useless (yes, yes, I've mentored interns).
If the hard part is the language, then do the research yourself, write an outline, and have the LLM turn it into complete sentences. That would at least be faster.
Here's the thing, though: If you do that, you're effectively proving that prose style is the low-value part of the work, and may be unnecessary. Which, as much as it pains to me say as a former English major, is largely true.
What's faster? Writing a 500 word report "from scratch" by researching the topic yourself, vs. having AI write it then having to fact check every answer and correct each piece manually?
This is why I don't use AI for anything that requires a "correct" answer. I use it to re-write paragraphs or sentences to improve readability etc, but I stop short of trusting any piece of info that comes out from AI.
> Then it cited an answer I gave on Stack Overflow [...] using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else
Artificial dementia...
Some parties are releasing products much earlier than the ability to ship well working products (I am not sure that their legal cover will be so solid), but database aided outputs should and could become a strong limit to that phenomenon of remembering badly. Very linearly, like humans: get an idea, then compare it to the data - it is due diligence and part of the verification process in reasoning. It is as if some moves outside linear pure product progress reasoning are swaying the RnD towards directions outside the primary concerns. It's a form of procrastination.
I wonder if it’s carried over too much of that ‘helpful’ DNA from 4o’s RLHF. In that case, maybe asking for 500 words was the difficult part — it just didn’t have enough to say based on one SO post and one article, but the overall directives assume there is, and so the model is put into a place where it must publish..
Put another way, it seems this model faithfully replicates the incentives most academics have — publish a positive result, or get dinged. :)
Did it pick up your HN comments? Kadua claims that’s more than enough to roast me, … and it’s not wrong. It seems like there’s enough detail about you (or me) there to do a better job summarizing.
I didn't actually give it a goal of writing any particular length, but I do think that perhaps given my not-so-large online footprint, it may have felt "pressured" to generate content that simply isn't there.
It didn't pick up my HN comments, probably because my first and last name are not in my profile, though obviously that is my handle in a smooshed-together form.
This is very bearish for current AI. Seems like 99% reliability is still too small with compounding errors. But I wonder of this is inherently specific to longer context or if this just depends on how it’s trained. In theory longer context => more errors
Although I think people are the same, too big problem and you are getting lost unless taking it in bites, so seems like OpenAI implementation is just bad because o3 hallucination benchmark shouldn’t lead to such poor performance
This is... very uncomfortable. An (expanded) AI summary of my HN and reddit usage would appear to be a pretty complete representation of my "online" identity/character. I remember when people would browse your entire comment history just to find something to discredit you on reddit, and that behavior was _heavily_ discouraged. Now, we can just run an AI model to follow you and sentence you to a hell of being permanently discredited online. Give it a bunch of accounts to rotate through, send some voting power behind it (reddit or hn), and just pick apart every value you hold. You could obliterate someone's will to discuss anything online. You could effectively silence all but the most stubborn, and those people you would probably drive insane.
It's a very interesting usecase though, filter through billions of comments and give everyone a score on which real life person they probably are. I wonder if say, Ted Cruz hides behind a username somewhere.
I put my profile in [0] and it's mostly silly; a few comments extracted and turned into jokes. No deep insights into me, and my "Top 3 Technologies" are hilariously wrong (I've never written a single line of TypeScript!)
That.. seems to just take a few (three or four) random comments that received some attention and then extrapolate an entire profile based on (incorrectly) interpreting their contents?
It stated that I had 47,000 reputation points on Stack Overflow -- quite a surprise to me, given my minimal activity on Stack Overflow over the years. I popped over to the link it had cited (my profile on Stack Overflow) and it seems it confused my number of people reached (47k) with my reputation, a sadly paltry 525.
Then it cited an answer I gave on Stack Overflow on the topic of monkey-patching in PHP, using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else. Looks like I don't have much expertise, after all.
Finally, it found a gem of a quote from an interview I gave. Or wait, that was my brother! Confusingly, we founded a company together, and we were both mentioned in the same article, but he was the interviewee, not I.
I would say it's decent enough for a springboard, but you should definitely treat the output with caution and follow the links provided to make sure everything is accurate.