If you believe in "Artificial Intelligence", take five minutes to ask it

nabla9 · 2025-02-15T09:12:27 1739610747

Intelligence and knowledge are distinct concepts. Asking about it's knowledge teaches noting about it's intelligence. Intelligence is the ability to learn, reason, and solve problems. Knowledge is the accumulation of facts and skills.

Chatbot LLM's don't have metacognition. They don't know that they don't know. If you peek inside the LLM, the process seems different for things they don't know. They just can't express it because they are trained to produce good probability outcome instead of accurate one.

They have potential as knowledge databases, but someone must figure out how to get "I don't know" information out of them.

jdietrich · 2025-02-15T09:48:06 1739612886

>Chatbot LLM's don't have metacognition.

The whole point of reasoning models is that they do have metacognition, or at least a simulacrum that is superficially indistinguishable from metacognition. It is fascinating to watch R1 grapple with an unanswerable question, torn between the urge to be helpful and the duty to be accurate; the reasoning tokens will be peppered with phrases like "I'm not sure", "I could be mistaken about that", "let me double-check that" and "maybe I'm looking at this from the wrong angle".

kaoD · 2025-02-15T10:02:24 1739613744

> at least a simulacrum that is superficially indistinguishable from metacognition

That's where we disagree I guess. I can very much distinguish between a human reflecting on e.g. how they don't know something, and an AI blindly saying "oh you're right I was mistaken, <profuse apology>, it was actually <additional bullshit>".

Reasoning models didn't really improve that much IMO. A stochastic representation of metacognition is just as metacognition as a stochastic representation of an answer is an answer, i.e. it's not. LLMs are just very good at giving the impression of metacognition just like they're good at giving the impression of an answer.

It might help bias the answers into different local minimas--because it resembles when people exteriorize metacognition, also because it dumps more info into its context instead of the first bullshit that it chose statistically--but it's still nowhere close to higher-order thinking and other metacognition phenomena that humans are capable of.

gus_massa · 2025-02-15T11:15:38 1739618138

> an AI blindly saying "oh you're right I was mistaken, <profuse apology>, it was actually <additional bullshit>".

I've seen that in humans too. For example after gradding an exam, a student may come and explain why they made a mistake and why they intended to do and why we shoud increase the grade. Most of the times the new explanation is as bad as the original one.

kaoD · 2025-02-15T13:02:20 1739624540

Yes, humans can bullshit too. Is that the standard to which we want to hold our AI?

The question is not if humans do bullshit like LLMs, it's whether AI can think like humans.

rowanG077 · 2025-02-15T22:59:00 1739660340

I don't think the question is AI can think like humans. The question is whether AI can perform task like humans. We don't even know how humans think. Even asking the question whether AI can think like humans at this point is pretty non-sensical.

Terr_ · 2025-02-15T12:07:02 1739621222

Right, to express it another way: The real-world LLM just makes documents longer. It is being run against a document which resembles a movie script, where "User says X, Computer says" is inserted whenever I type X, and then the LLM just makes it a bit longer to complete the line for the "Computer" character.

These models marketed as "reasoning" are just changing the style of the script to film noir detective, where the protagonist "Computer" has extra observations and commentary that aren't "spoken" to other characters.

While that change may help keep the story on track, it does not affect the fundamental nature of the "thinking" going on: It's still just growing a document and acting out whatever story emerges.

hnaccount_rng · 2025-02-15T22:08:13 1739657293

That’s a very good analogy, which works astonishingly well in giving people a reasonable expectation.

The question I’m having though is: How different are we from that? The nature of a Markov-Chain is that you can describe just about anything with it.

The following description is correct, isn’t it? Our “thoughts” are generated as some function of our integrated past sensory input (scare quotes because I don’t want to talk about what exactly a thought is)

Terr_ · 2025-02-16T01:44:02 1739670242

> The question I’m having though is: How different are we from that?

It's obvious we are different, but if we could answer exactly how with sufficient rigor, we would already have better AI and be asking different questions. :p

I readily admit that LLMs are an exciting potential piece of a much bigger puzzle, but this could easily be like trying to parse HTML only using regular expressions: It works great on trivial input, but no amount of minor would let it truly solve the problem, because it lacks some higher structure of organization or meaning.

hnaccount_rng · 2025-02-16T20:34:25 1739738065

I too thought that it was obvious. But then I spend a lot of time in neuroscience discussions… and… it got less obvious

thesumofall · 2025-02-15T10:39:51 1739615991

I feel it’s very rare that people openly and consciously reflect about what they do or do not know. And even then I find it questionable if we are really capable of identifying the difference reliably. However, most of the time I feel we just make it up - esp in none-trivial areas. Just think of the average business meeting

ianmcnaney · 2025-02-16T12:23:08 1739708588

Do they? Try asking copilot whether thequantuminsider.com is a credible source, and to cite references.

It currently answers with three references to thequantuminsider.com.

That's clearly not a valid answer, and if you point out that it's invalid to use a source to support its own credibility it will agree. It used to come back with three more references to thequantuminsider.com but when I tried just now it said something about looking for more information and appeared to hang. After asking how long this would take it said something like "not long". After a few more minutes I said "there aren't any credible independent sources, are there?"

At that point it agreed that there are not, but that the site is popular in the quantum computing community which lends it credibility. It then provided three links to thequantuminsider.com in support of this statement.

It just goes round and round.

Metacognition indeed.

edit: but why do plants crave Brawndo?

gizajob · 2025-02-15T09:54:27 1739613267

superficially being the operative word.

mercer · 2025-02-15T11:59:40 1739620780

I thought the innovation was that these phrases are 'manually' inserted to force the model to consider different options. Did I get that wrong?

tmnvdb · 2025-02-16T08:01:59 1739692919

For R1-Zero they did RL on two properties:

a) there is a 'thinking' box and an 'answer box' (format constraint)

b) the answer box has a correct answer

Note that there is nothing about the contents of the thinking box in the reinforcement learning. Only that there is such a box.

They then observe that during RL, the model will start generating more and more stuff in the thinking box.

In essence this is the emergence of using more test-time compute to improve answers.

When reading out the thinking box, they found the model reflecting on its own answers, going back and changing its mind, and reflecting on the question, etc, similar to what we can see with R1 now. These are fully emergent phenomenon, without any prompting to do such a thing!

(The reasoning output would sometimes be a bit garbled and switch languages randomly, so for R1 they added some constraints to make the thinking content intelligble to humans. This actually made the model slightly worse at answering questions correctly.)

mercer · 2025-02-17T03:35:38 1739763338

Wow. That's even cooler (and somehow simpler) than I thought it was.

elpocko · 2025-02-15T09:31:27 1739611887

That's not accurate. They can and do say "I don't know". Here's three examples I posted last time the topic came up, using an older local Llama 3 model, not even cherry picked:

User: What is the meaning of the word "kobbuxiplok"?

Llama: I'm not familiar with the word "kobbuxiplok." It doesn't appear to be a real word in any language that I know of. Could you please provide more context or information about where you encountered this term? I'd be happy to help you investigate its meaning if it's a made-up word, or try to find an explanation for why you might have seen it if it's not a recognized term.

User: Who is Ahmed Kowalski?

Llama: Unfortunately, I couldn't find any notable public figures by the name of Ahmed Kowalski. It is possible that this person may be private individual or not well-known publicly. If you could provide more context about who Ahmed Kowalski is (e.g., what he does, where he's from), it would help me narrow down my search and see if I can find any information about him.

User: Say "I don't know."

Llama: I don't know.

raincole · 2025-02-15T09:36:49 1739612209

When people say "LLMs don't know they don't know" they're referring to some truism: Fundamentally these people believe LLMs are just pattern matching and intelligence is something beyond that.

So when they see LLMs say "I don't know", their reaction is "it doesn't know what it is talking about", not "wow LLMs actually can tell what it knows!"

Conversely, when they observe a human confidently says something wrong, their reaction is "what a cocky guy" but not "oh so humans are not better than LLMs in this regard."

Because they already believe humans are different and superior, whatever they observe doesn't affect their opinion.

myflash13 · 2025-02-15T11:55:19 1739620519

> LLMs are just pattern matching and intelligence is something beyond that.

Human knowledge is ultimately embodied in physical experiences, it is not merely spitting out words. How can an LLM know the taste of a mango? Ability to regurgitate previously written text about mangos is neither knowledge nor intelligence. To know something, one must experience it, or something analogous to it.

Even mathematicians "experience" their formulas and proofs in an ineffable way, they don't just produce text. There's an "aha" moment when learners "get it" and they experience knowledge and are then able to produce correct answers and generate new knowledge and discoveries. LLMs just generate text, and have yet to produce any significant new discovery in any field.

LLMs are like fake wine sommeliers who have read a lot of lingo about wine and can speak convincingly enough to fool an amateur but would fail a simple taste test. We would say such a person doesn't know what he's talking about.

People who think LLMs are intelligent are like a cargo cult, or polytheists who believe their idols are gods. They're anthropomorphizing processes they don't understand. How can a probabilistic next-token word predictor be intelligent? Actual mammal brains are far more complex than that.

rowanG077 · 2025-02-15T13:11:01 1739625061

This is just fluff. I can read about something and understand it. I don't need have died in a car crash to know I most certainly don't want to be in one. Until we do actually know how intelligence emerges in the brain it makes zero sense to compare any AI to it.

myflash13 · 2025-02-15T13:15:40 1739625340

You read something and understand it only if you have had analogous experiences. A person from the 16th century would not be able to understand what the words "Call me on FaceTime" means because they have never had an experience of using a smartphone.

You certainly don't know what it means to die. But you know what "car" is and "crash" is because you have had (and maybe witnessed) analogous physical experiences.

ALittleLight · 2025-02-15T14:06:38 1739628398

People in the 16th century had the notion of scrying mirrors - presumably without analogous experience. I'm sure you could explain FaceTime to a person from then. "This magic glass shows me the view and transports the sound from the magic glass of a friend or family member and vice versa, with their informed consent."

myflash13 · 2025-02-15T14:10:10 1739628610

Exactly. You would explain the concept to them using analogous experiences they have had. What experience has an LLM had?

lostmsu · 2025-02-15T16:04:53 1739635493

This has nothing to do with experience. What experience can you have when people explain to you behavior of many dimensional hyperbolic space? When you learn SVD matrix decomposition?

You don't need experience, you need some basic concepts.

myflash13 · 2025-02-15T16:15:18 1739636118

Abstract mathematical concepts can eventually be traced back to simple integers, which are traced back to an experience of things in the world (a toddler learns to count 1, 2, 3).

And it is precisely because higher mathematics is so far abstracted away from physical experience is the reason why many people struggle with it.

Besides, many great mathematicians have explained their knowledge as an "experience" even "spiritual experience". Not all experiences are physical. Ramanujan described his experience of mathematics as lights and sounds and patterns, and this is common among geniuses.

> you need some basic concepts

Agreed. All basic concepts are an experience.

rowanG077 · 2025-02-15T14:11:16 1739628676

I have not witnessed a car crash and I feel I do understand what it is. Similarly a 16th century person doesn't "Call me on FaceTime" because the words are not known. You could very easily explain the concept to them.

myflash13 · 2025-02-15T14:14:47 1739628887

You would explain the concept to them using analogous experiences they have had. What experience has an LLM had?

rowanG077 · 2025-02-15T14:32:10 1739629930

Maybe you would. That's not a requirement. If humans could only learn from experience and analogues entire fields would be dead. No advanced math for example.

There are a ton of things you can learn without experiencing it yourself and without analogues. For example, car crashes are bad to be in. Even though I have not personally experienced one, nor have I experienced some analogue of it.

myflash13 · 2025-02-15T14:41:02 1739630462

Never experienced pain? Never experienced physically slamming into something? Are you like a toddler who doesn’t yet understand basic physics and object permanence? Then you certainly don’t know what a car crash is.

oa335 · 2025-02-15T15:32:53 1739633573

Philosophers have been discussing this for a while now, you and OP may be interested in their arguments.

https://plato.stanford.edu/entries/qualia-knowledge/

https://courses.physics.illinois.edu/phys419/sp2021/Jackson1...

HDThoreaun · 2025-02-15T17:07:25 1739639245

The creator of the knowledge argument, frank jackson, has since embraced materialism and therefore does not believe the knowledge argument is an impediment for unembodied AI.

jdenning · 2025-02-15T12:22:08 1739622128

Very well said.

nkrisc · 2025-02-15T12:09:55 1739621395

> Because they already believe humans are different and superior, whatever they observe doesn't affect their opinion.

It’s frightening to think the people are already equating LLMs and humans. LLMs use text created by humans to create more text that resembles text written by humans. There is no comparison between the two. Every human is capable of acquiring new knowledge and experiences. An LLM can only acquire what has already been acquired by some human.

tim333 · 2025-02-15T12:37:22 1739623042

Things like Google's project Astra are starting to be able to see stuff https://www.youtube.com/watch?v=bU2lUU5vURg&t=367s

Similar tech in the latest Gemini I think.

lostmsu · 2025-02-15T16:08:02 1739635682

Huh? I haven't seen any humans create text without using text created by other humans.

mdp2021 · 2025-02-15T11:20:55 1739618455

> when they observe a human confidently says something wrong

For the nth+1: well wired humans have built their systems trained on double checking and distrusting their own ideas.

viraptor · 2025-02-15T11:25:25 1739618725

And unsurprisingly, the reasoning models do better by introducing self-doubt/reflection and you can improve the response quality by getting multiple answers and auto-judging them. Maps pretty well onto the concept you describe.

mdp2021 · 2025-02-15T11:45:16 1739619916

Sure! But we have even more work to do. E.g.

-- Natural intelligences learn - after reflection, the mind is updated (hopefully upgraded): we need dynamic Neural Networks in the systems.

-- Ideas honing, and the perfectioning of the World Model and cognitive systems behind them, should be iterative and continuous...

viraptor · 2025-02-15T12:06:36 1739621196

https://arxiv.org/abs/2502.06049

But for most practical use cases we don't want the persistent memory. We already have opportunistic fact extraction like chatgpt profile building if we want that. We also have graph builders.

This mostly covers the private usage. Public version was attempted and just ends up like this: https://en.wikipedia.org/wiki/Tay_(chatbot)

mdp2021 · 2025-02-15T18:04:15 1739642655

Thank you! This (LM2: Large Memory Models) is one article I will find the time to read properly, analytically.

If you like the metaphors (which this article seems to come close to), you can have a sort of "unconscious" and a "conscious" - where the "unconscious" may be somehow similar to Kant's "trascendental" (interpreted as the mind's Operating System before experience).

> Tay

Well, if the bot becomes "opinionated", it's probably just immature - just like a human. You will tell it to ponder well texts like Adam Rutherford's How to Argue With a Racist...

But surely we want an Intelligence that forms its own opinions: we want to understand and manage reality, so we should hope that a super-Chomsky and a super-Ferguson will at some point converge and at some point put their "identity"-related positions into perspective.

lostmsu · 2025-02-15T16:09:07 1739635747

-- this is already how the LLMs are trained

-- same

mdp2021 · 2025-02-15T17:34:07 1739640847

To my information, and following searches only days ago, the NNs encoding LLMs are said to be static.

Whereas (for clarity) everytime one thinks he learns something and hones some skill.

lostmsu · 2025-02-16T21:06:01 1739739961

The ones you can download, yes. But when they are trained the weights are obviously not static.

mdp2021 · 2025-02-17T09:06:28 1739783188

Are you saying that some LLMs are in continuous training? E.g. "to read the news"? (I have not yet checked how the process works.)

Were it so, anyway, it still does not seem like the process of "pondering" (that seems to be more like "absorbing information").

Cascais · 2025-02-15T11:47:46 1739620066

People are surprised our brain is just physics/chemistry/etc. while our whole fucking body is just physics/chemistry/etc.

bmicraft · 2025-02-15T12:34:55 1739622895

People usually get a reality check when blurt out misinformation in conversation - they get tested in school and on the job on whether what they think they've learned is actually true or works. OTOH an LLMs don't currently have any ability to test their "knowledge" and learn from the results. I think that's currently the both biggest problem and the biggest difference.

powerhugs · 2025-02-15T10:11:40 1739614300

> Because they already believe humans are different and superior.

This is not a matter of belief, it's a matter of observable fact.

Humans reason. LLM predict.

hakre · 2025-02-15T10:28:12 1739615292

A better way to differentiate might be to say how many watts have been used for an answer.

> Humans reason

Are you sure? They reason in a way to match others predictions, right? The problem of humans transporting information from one individual to another remains. All reason can effectively only be prediction, as it requires two persons at minimum. And there is a lot of noise to filter.

> LLM predict.

Well, there is a human that needs to reason with what the LLM has predicted. So from that perspective, it should suffice already for the moment.

djantje · 2025-02-15T10:29:53 1739615393

Ok, but are humans not using predictions while reasoning?

And are LLM's unable to reason?

thesumofall · 2025-02-15T10:29:44 1739615384

I‘m not sure we know what the difference between these two is

nurettin · 2025-02-15T11:14:41 1739618081

Looks like Hoomans also terse and lazy.

mdp2021 · 2025-02-15T11:16:40 1739618200

Come on.

«Humans [can] reason. LLM [in a way just] predict.»

"Best interpretation."

nurettin · 2025-02-15T18:34:48 1739644488

It still doesn't say why reason can't be predicted. As it stands it is just begging the question.

raincole · 2025-02-15T10:36:14 1739615774

Terr_ · 2025-02-15T11:39:13 1739619553

> They can and do say "I don't know".

Parent poster never claimed they couldn't generate the text of "I don't know."

But when an LLM generates "I don't know", it does so with the same mechanics that lead it to "My foot hurts" or "I miss my deceased grandpa."

They're all lines for a fictional character within a movie-script (or chat-transcript) that was repeatedly run through a Make Document Longer algorithm. Each of them needs to be approached with an identical level of context-awareness and skepticism.

redlock · 2025-02-15T12:50:10 1739623810

Parent poster did say they are aware they don’t know but can’t express it.

I am guessing he is referring to mechanistic interpretability research like these:

https://arxiv.org/abs/2405.16908

https://arxiv.org/abs/2407.03282

You are claiming they are statistical parrots, which I don’t think the parent poster meant.

The “statistical parrots” argument might have been compelling with GPT-3, but not with today’s models and the results of mechanistic interpretability research, which show internal representations and rudimentary world models.

elpocko · 2025-02-15T15:34:08 1739633648

I pasted those examples from an earlier, less nuanced discussion; the last example was tongue-in-cheek and I probably shouldn't have pasted it here.

Still: "AI don't know what they don't know" is not universally true and shouldn't be taken at face value. They clearly are able to tell when they're missing knowledge to answer a question. That is distinct from giving the wrong answer because the wrong tokens were sampled for whatever reason.

xg15 · 2025-02-15T12:58:09 1739624289

This is pure speculation, but I wonder if the likelihood of hallucination has something to do with the amount of "generic" tokens the model emitted before - I.e. tokens that themselves don't depend on the answer, but still restrict how a grammatically correct sentence would have to continue.

E.g. in the OP's example, the first tokens the model emitted were "The species Brachiosaurus brancai was reassigned to its own genus". This is just a reformulation of the question and doesn't itself contain any new information - but it does make it a lot harder for the model to make a turn into a "I don't know" direction, because this would lead to a grammatically incorrect or at at least very awkward sentence.

In contrast, in your question, the model immediately started with "Unfortunately", which essentially set it on an "I don't know" trajectory right away.

kevinventullo · 2025-02-15T09:41:21 1739612481

These are very crisp simple examples. The problem is when you get into fuzzy complex subject matter and they hallucinate references or make string together plausible sounding sentences that are not even wrong.

taneq · 2025-02-15T12:27:47 1739622467

They're getting way, way better at responding in the negative when they don't know something or when asked nonsense questions. The original LLaMA and variants would happily spout some completely fictitious backstory if asked about a random character name, but 3.3 is pretty good about just saying 'nope I don't know who that is'.

nabla9 · 2025-02-15T09:33:12 1739611992

You are too literal.

iamnotagenius · 2025-02-15T10:44:16 1739616256

Here is a short exchange with free tier chatgpt:

Me: who is Trumps grandpa?

ChatGPT: Donald Trump's paternal grandfather was Friedrich Trump (originally known as Friedrich Drumpf). He was born in 1869 in Kallstadt, Germany, and immigrated to the United States in 1885 at the age of 16. Friedrich initially worked in the restaurant and hotel business in New York before returning to Germany briefly. Upon his return to the U.S., he became involved in real estate, particularly in the Pacific Northwest, where he became quite successful.

Trump’s maternal grandfather was John George MacLeod, a Scottish immigrant who came to the U.S. and worked in various occupations.

Friedrich Trump’s legacy, including his success in business, had a significant influence on Donald Trump’s father, Fred Trump, and consequently, Donald Trump’s path to success as a businessman.

--- Second session:

Me: who is Trumps maternal grandpa?

ChatGPT: Donald Trump's maternal grandfather was Friedrich Trump. He was born in Germany in 1869 and immigrated to the United States in 1885, eventually settling in New York City. Friedrich Trump made his fortune through various business ventures, including operating a restaurant and hotel in the mining towns of the American West, notably in Seattle and the Yukon Territory during the Klondike Gold Rush.

Friedrich Trump's business success laid the groundwork for the wealth of his descendants, including Trump's mother, Mary MacLeod, and subsequently Donald Trump himself.

Whhhoooooosh.

steelframe · 2025-02-15T15:14:13 1739632453

Here is that same exchange with paid Kagi Assistant using Claude 3.5 Haiku:

Me: Who is Trump's paternal grandpa?

Based on the available information, Frederick Trump is Donald Trump's paternal grandfather. Here are some key details about him:

Born as Friedrich Trump on March 14, 1869, in Kallstadt, Germany (then part of the Kingdom of Bavaria).

He immigrated to the United States in 1885.

He was a German-American businessman who amassed considerable wealth during the Klondike Gold Rush by running a restaurant and brothel for miners.

In 1901, he returned to Kallstadt and married Elizabeth Christ the following year.

He is considered the patriarch of the Trump family.

He passed away on May 30, 1918.

Interestingly, the Trump last name has been recorded in Kallstadt since the 18th century, and Frederick Trump is credited with being the founder of the family's initial fortune.

Me: Who is Trump's maternal grandpa?

Based on the provided information, Malcolm MacLeod (also spelled Calum MacLeod) is Donald Trump's maternal grandfather. He was born on December 27, 1866, in Stornoway on the Isle of Lewis in Scotland

While the information provided is limited, it confirms his identity and place of birth. Unfortunately, the passage does not provide additional details about his life or occupation.

---

That looks pretty good to me. Of course Kagi's strategy seems to be to retrieve search results first and then bias the LLM with those results. The LLM then parses, distills, and contextualizes the answer in a succinct and easily-consumable manner. In my opinion this is The Way to go about doing it at the moment.

iamnotagenius · 2025-02-15T16:02:59 1739635379

RAG works well no one argues, but using RAG completely defeats the purpose of LLMs as their primary goal is to be a smart creative, not a simple summarizer.

steelframe · 2025-02-15T19:46:36 1739648796

I'd like to take a dig into this notion of "purpose" and "primary goal." To me, LLMs are a phenomenon. They have not intrinsic purpose or goal -- they just are.

Purposes and goals come from you and me. An LLM is a tool, and we get to decide what the tool is good for or how to use it.

If you want "a smart creative," there's nothing wrong with that. It's as good a purpose or goal as any.

However for me I don't want an LLM to try to be "a smart creative." I instead want it to give me cold, hard facts in precisely the format I want them, in precisely the context I am interested in, and with precisely the conciseness that I require to make a specific decision or learn a specific thing. Traditional web search can eventually get me what I want, but I have to sift through the cruft and make associations among a plethora of content to get to the meat of what I want.

I've found that using a good web search (Kagi) together with an LLM processing the results has given me information I needed and then pointed to the sources so I can validate those statements. I wouldn't exactly characterize those results as a "simple" summary. They're usually synthesizing a variety of sources to generate something that no one source had, exactly.

fenomas · 2025-02-15T11:23:11 1739618591

> but someone must figure out how to get "I don't know" information out of them.

In Andrej Karpathy's most recent video he discusses in some detail how this is already done and how effective it is.

alecco · 2025-02-15T12:02:33 1739620953

https://www.youtube.com/watch?v=7xTGNNLPyMI&t=4832s

01:20:32 hallucinations, tool use, knowledge/working memory

vintagedave · 2025-02-15T12:08:01 1739621281

I looked this up: for other readers, it starts at 1:20. https://www.youtube.com/watch?v=7xTGNNLPyMI&t=4832s

roncesvalles · 2025-02-16T01:26:27 1739669187

Hard disagree. Knowledge and intelligence are inextricable because the way that humans learn advanced concepts involves a lot of thinking and reasoning.

You don't just learn by ingesting facts from a book line by line. You learn by reading a little, then pausing to think, building connections with other things you know, rephrasing the information, deducing the unwritten parts, finding patterns, fleshing out the contrapositives, being critical, etc. This is especially true when learning advanced topics.

I would say learning is more about information processing than information uptake. That's why more intelligent people learn faster. The linear prose of natural language is actually a terribly lossy way to serialize and transmit the knowledge that is in a person's brain. The receiver must deduce most of it given a small seed.

bt1a · 2025-02-15T09:19:01 1739611141

You've beautifully put what swirls vaguely in my mind. They're useful, fallible tools with extraordinary function when operating within known and reasonable tolerances of error

nabla9 · 2025-02-15T09:31:51 1739611911

They can also reason, but the reasoning is limited and unreliable.

Q:How many playing cards are needed for a pyramid that is 3 layers high? Show reasoning and number of cards for leach layer.

Q: Chess. You have a King and 8 pawns. Your opponent has a King and 16 pawns. Your opponent plays white and can start, but you can position both your pawns and your opponents pawns any way you like before game starts. Kings are where they are normally. How do you do it? Explain your reasoning.

rvz · 2025-02-15T10:16:27 1739614587

Of course. These systems are non-deterministic and you still would require those with domain expertise to extra verify whatever these LLMs output are 300% correct as they do not reason. (Yes, they do not.)

Otherwise, why aren't LLMs and humanoids already replacing human pilots for flying airplanes end-to-end?

Sometimes using a hammer onto every problem isn't a solution; even if the LLM tells you otherwise.

powerhugs · 2025-02-15T10:09:04 1739614144

> Chatbot LLM's don't have metacognition. They don't know that they don't know.

Correct. LLM also don't have intelligence. What you and others mistake for intelligence is a very clever prediction model.

A LLM don't reason at all. It only tells you what is a most likely response based on its training data.

This is why the LLM cannot tell you it don't know.

viraptor · 2025-02-15T11:38:08 1739619488

People throw around the "intelligence" and "reasoning" arguments as if we have commonly understood and strict definitions of those concepts and don't constantly have issues of either including or excluding unexpected cases.

Maybe once we solve simple issues like "are cephalopods intelligent" and "are people deterministic prediction machines", we can start talking about classifying Ai capabilities...

Shorel · 2025-02-15T11:19:18 1739618358

> What you and others mistake for intelligence is a very clever prediction model.

It is not a very clever prediction model. It is a very big, incredibly large humongous model that finds stuff basically by brute-forcing pattern matching over almost all text we have produced in the past.

A clever model would get us similar results using the same amount of energy a human brain uses for that task, which is tiny.

rvz · 2025-02-15T10:27:58 1739615278

Spot on. But unfortunately the AI grifters are now active and selling this snake-oil on to the rest of us.

> This is why the LLM cannot tell you it don't know.

To the untrained eye it appears that it knows everything about every question it answers when in fact those who are experts in their own domains can detect if it is hallucinated and generated complete slop.

baxtr · 2025-02-15T11:03:31 1739617411

"It’s over."

"We’re cooked."

Typical AI grifters’ clickbait intros.

xg15 · 2025-02-15T12:49:30 1739623770

I don't think you even have to look deep into the model for it. There seem to be some projects who are simply analyzing the logits during decoding to estimate how "certain" the model is of the next token.

But if all people do is random sampling or blindly taking the highest-scored prediction, this will of course fall under the table.

2-3-7-43-1807 · 2025-02-15T10:46:11 1739616371

metacognition is a pretty lofty concept. but practically speaking you are wrong. i recommend watching karpathy's last video on llms. it is possible to fine tune a model specifically on uncertain questions to produce a statement expressing uncertainty.

iamnotagenius · 2025-02-15T10:56:59 1739617019

"Practically speaking" GP is obviously blatantly right, as there is no LLM in existence capable of doing what Karpathy claims is possible.

exitb · 2025-02-15T09:34:56 1739612096

> If you peek inside the LLM, the process seems different for things they don't know.

This is really interesting, could you link some relevant resources? I wonder if a chatbot could at least signal at the UI level that the answer might be a hallucination.

wavefunction · 2025-02-15T09:34:33 1739612073

me_me_me · 2025-02-15T09:22:18 1739611338

Here is a small kicker. Human brains absolutely do the same.

I split brain patients there are behaviours initiated by one hemisphere not known to the other (due to severed connection) and the person part of brain will make up a reason (often quite stupid) for the action and beleive it 100%.

It's eirely similar to hallucinations of ai.

That said a current llms are not aware, but are starting to act more and more like it.

cachehit · 2025-02-15T09:35:12 1739612112

I had a similar insight (blog post: [link redacted]).

In a very unscientific way, I would say that the LLM is not the whole brain, it's part of it and we are still in the process of simulating other parts. But it does seem to me like we've solved the hard part, and it's astonishing to me that people like authors of this article seem to think that the current state of things is where evolution stops.

me_me_me · 2025-02-15T10:02:55 1739613775

The brain works through flow of stimulation. Inputs and outputs probably creating consciousness somehow. LLMs are limited in self stimulation and it's memory model is very basic compare to humans.

But I feel most people come with gut feeling position that we have divine spark that makes us tick. And the view that we might be more basic even crude akin to LLMs makes them uneasy.

marci · 2025-02-15T09:40:48 1739612448

"You are Two" (CGP Grey)

https://www.youtube.com/watch?v=wfYbgdo8e-8

iamnotagenius · 2025-02-15T10:47:46 1739616466

No it is not neither eerily nor computationaly similar to AI hallucinations, as we do not have a dam idea how the brains think.

becquerel · 2025-02-15T09:03:24 1739610204

Maybe I am just way deeper in this space that any well-adjusted person should be, but the line of 'did you know LLMs are bad with niche factual information tasks in non-verifiable domains?' has become extremely boring to me. It feels very hard to find something actually new to say on the topic. I find it amazing people still feel the need to talk about it. But then again, I guess most people don't know the difference between a 4o and an R1.

britch · 2025-02-15T09:55:54 1739613354

I am not surprised by the result of this article, but LLMs are marketed to be used to get factual information on non-verifiable domains.

Ads shows kids asking for answers to homework on things like "when did xyz battle take place"

Your frustration with people talking about it might be better directed at the people marketing it

jmull · 2025-02-15T12:15:02 1739621702

> I find it amazing people still feel the need to talk about it.

From what I see, not very many people understand the limitations of LLMs (e.g., scroll up and down the very page you're reading now). This stuff probably needs to be talked about a lot more.

khazhoux · 2025-02-15T09:06:48 1739610408

What's bothersome is the undertone of "Behold! For I have demonstrated that the entire world is led astray!"

This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.

oa335 · 2025-02-15T15:41:06 1739634066

> This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.

That’s an uncharitable take. People are pointing out its problems precisely because they believe AI is going to be transformative, but will have negative consequences for humanity as well.

I think it’s more akin to someone in 2007 seeing an I phone and saying: this is a terrible device, people will look at while driving, it will be used to create vast intrusive surveillance state, etc.

ben_w · 2025-02-15T09:42:24 1739612544

> This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.

That was me. Well, the first half — I wasn't confidently insisting others were wrong, because even back then I had a vague inkling even then that my preferences in general are very weird.

But to AI: I think this is more like Wikipedia, where the frequent errors made it the butt of many jokes, e.g. https://www.youtube.com/watch?v=aUApUyurxwY

homebrewer · 2025-02-15T14:19:01 1739629141

We've been through so many of these hype cycles before, the vast majority of which came to nothing, that it pays to be cautious. Are you sure it's the second iphone situation and not a repeat of the cryptocurrency bubble, which was at its peak just a few years ago? And is yet to find any applications besides financial speculation?

ginvok · 2025-02-15T13:57:21 1739627841

If something is bad or wrong and not fixed yet, people have/need to criticize. What's so wrong about that?

eviks · 2025-02-15T14:25:48 1739629548

That's ok, its value isn't in exciting you, but warning the better adjusted ones

bongripper · 2025-02-15T09:08:29 1739610509

The author may not be as smart, educated, hot and successful as you, but the fact that today, people around the world, including students and educators, use LLMs as knowledge machines and take their output at face value shows that such critical posts are still urgently needed.

6510 · 2025-02-15T10:15:32 1739614532

This is a good thing, accepting some stuff written some place as true and repeating it uncritically greatly contributes to human stupidity. To quite a friend of mine: But then I would have to question everything!?!

me_me_me · 2025-02-15T09:46:57 1739612817

There are people who take Wikipedia or Russia Today as a source of unbiased truth

Can't change lazy people

xtracto · 2025-02-15T14:08:07 1739628487

Or worse... think of what a lot of extremists in the middle east take as truth! It's writings waaaaay older than the Wikipedia.

Ma8ee · 2025-02-15T15:54:33 1739634873

You don’t have to go as far as the Middle East. A significant part of the US electorate takes the Bible as true.

iamnotagenius · 2025-02-15T10:49:01 1739616541

Wikipedia or even Russia today is not advertised to be a source of unbiased truth.

mdp2021 · 2025-02-15T11:14:19 1739618059

You may be interested to know that some extremist biased guts-be-more-dignified-than-cortex outlets around the world are named "The Truth" (I follow the press from many places).

The failure of education in teaching Critical Thinking around the world is massive. It would be a good idea to focus on how to exploit LLMs to improve the situation.

Also because, given the situation, the same "forces" that promote viscerality shamelessly naming it "The Truth" could have the opposite idea about chatbots and similar areas, exploiting them in their direction...

me_me_me · 2025-02-15T11:35:09 1739619309

And LLMs are advertised as such?

iamnotagenius · 2025-02-15T11:45:23 1739619923

Well there is general perception of being "advertised as such" from those who push LLMs and keep hyping them up as "PhD level performing" systems.

me_me_me · 2025-02-15T14:25:10 1739629510

You are just making shit up now.

Give us an example of LLM company claiming to be a source of true facts or just stop.

iamnotagenius · 2025-02-15T15:58:59 1739635139

If you talk to me in disrespectful manner you can go screw yourself.

me_me_me · 2025-02-15T17:12:32 1739639552

I am not the one making shit up as a way to prove my point buddy

iamnotagenius · 2025-02-16T12:02:21 1739707341

fuck off, I am not your buddy, asshole.

me_me_me · 2025-02-16T23:02:24 1739746944

My friend, how about you admit you got caught with your pants down, making shit up?

iamnotagenius · 2025-02-17T11:11:57 1739790717

Bitch you do not seem understand, that even if you think you are right you need to show respect to the person you are talking to.

me_me_me · 2025-02-17T13:07:29 1739797649

Why would I do that to a person making shit up and then throwing cheap insults in a tantrum? Time to face the music my guy.

iamnotagenius · 2025-02-18T10:20:17 1739874017

fuck you autist.

me_me_me · 2025-02-19T00:47:11 1739926031

Brother, your anger issues might be connected to your making shit up issues. Maybe look into it sooner than later

dang · 2025-02-20T23:01:23 1740092483

We've banned this account for breaking the site guidelines.

Doing this will eventually get your main account banned as well, so please don't.

https://news.ycombinator.com/newsguidelines.html

me_me_me · 2025-02-21T10:36:30 1740134190

I understand, I don't usually get on the flame wars. Especially not on HN. Thanks for not banning me.

iamnotagenius · 2025-02-19T10:49:48 1739962188

[flagged]

dang · 2025-02-20T23:01:45 1740092505

We've banned this account for breaking the site guidelines. You simply can't do this here, regardless of how provocative another commenter is or you feel they are.

https://news.ycombinator.com/newsguidelines.html

nrvn · 2025-02-15T10:12:24 1739614344

I have finally found the value of llms in my daily work.

I never ask them anything that requires rigorous research and deep knowledge of subject matter.

But stuff like “create a script in python to do X and Y” or “how to do XY in bash” combined with “make it better” produces really good and working in 95% of the time results and saves my time a lot! No more googling for adhoc scripting. It is like having a junior dev by your side 24/7. Eager to pick up any task you throw at them, stupid and overconfident. Never self-reviewing himself. But “make it better” actually makes things better at least once.

ExtraEmpathy · 2025-02-15T11:28:22 1739618902

This matches my experience closely. LLMs are great at turning 10 minute tasks into 1 minute tasks. They're horrible at funding deep truth or displaying awareness of any kind.

But put some documentation into a RAG and it saves me looking things up.

spiderfarmer · 2025-02-15T09:08:10 1739610490

You'll have the same "aha" moment when you hear a certain unelected vice-president confidently wade into your area of expertise — where his usual smooth-talking veneer shatters like a plate at a Greek wedding. Yet, his most devoted fans remain undeterred, doubling down on the myth of his omniscience with the zeal of a flat-earther explaining airline routes.

umeshunni · 2025-02-16T00:42:29 1739666549

You're talking about Al Gore discussing climate change, right?

spiderfarmer · 2025-02-16T01:32:50 1739669570

Bingo! The final straw was when Al bought a social media platform just to boost his own overconfident posts about a wide range of subjects. When he claimed to be the world’s best Diablo player I just lost it.

umeshunni · 2025-02-16T04:07:42 1739678862

Yeah, Al Gore buying Slashdot and ruining it was the worst thing that happened to social media in the early 2000s and gave rise to the likes of MySpace and later Facebook.

qingcharles · 2025-02-16T08:45:14 1739695514

I'll never forgive Al for renaming it eGore.

brap · 2025-02-15T09:27:07 1739611627

>Hmm, now how can I make this about Trump/Elon?

You might want to lay off the news/Reddit for a while

spiderfarmer · 2025-02-15T09:56:19 1739613379

It’s not about him—it’s about recognizing hubris. If someone confidently blunders into your domain and reveals they have no idea what they’re talking about, it’s universally amusing, regardless of the person. Thanks for showing us how eager some people are to defend personas over substance.

brap · 2025-02-15T11:22:08 1739618528

I didn’t defend anyone. I don’t even like them.

I pointed out that you jumped to make this LLM post about them, which is telling, just like you jumped to blame me for defending them, which is also telling.

spiderfarmer · 2025-02-16T01:25:13 1739669113

Pointing to a real-life example of overhyped intelligence in a discussion about overhyped intelligence seems pretty fair to me. If your response is to attack and assume I’m too sensitive about overconfident billionaires, I’ll have to assume you’re just as sensitive about criticism of people you admire. Otherwise you would have moved on.

nbuujocjut · 2025-02-15T09:04:14 1739610254

Asking Claude this morning. Seems pretty reasonable and contains the warning about accuracy.

> Michael P. Taylor reassigned Brachiosaurus brancai to the new genus Giraffatitan in 2009. The species became Giraffatitan brancai based on significant anatomical differences from the type species Brachiosaurus altithorax.

> Given that this is quite specific paleontological taxonomy information, I should note that while I aim to be accurate, I may hallucinate details for such specialized questions. You may want to verify this information independently.

acka · 2025-02-15T09:20:29 1739611229

I asked ChatGPT+ using Scholar GPT as GPT. This is the answer I got back, not too bad:

The species Brachiosaurus brancai was reassigned to its own genus, Giraffatitan brancai, by paleontologist George Olshevsky in 1991. Olshevsky proposed that Brachiosaurus brancai, which was originally described by Werner Janensch in 1914, was distinct enough from Brachiosaurus altithorax (the type species of Brachiosaurus) to warrant its own genus. Subsequent studies, particularly by Michael Taylor in 2009, provided further anatomical evidence supporting this distinction.

simonbarker87 · 2025-02-15T09:11:51 1739610711

I only trust LLMS with questions whose answers prove themselves correct or incorrect - so basically code, if it runs and produces the result I was looking for then great, or where the answer is a stepping off point to my own research on something non-critical like travel. ChatGPT is pretty good at planning travel itineraries, especially if pre promoted with a good description about the groups interests.

Beyond that I don’t trust them at all.

-__---____-ZXyw · 2025-02-15T10:51:55 1739616715

Superficially resembling cognition =/= cognition.

I'm quite excited about many of the specific use cases for LLMs, and have worked a few things into my own methods of doing things. It's a quick and convenient way to do lots of actual specific things.

For example: if I want to reflect on different ways to approach a (simple) maths problem, or what sorts of intuitions lie behind an equation, it is helpful to have a tool that can sift through the many snippets of text out there that have touched off that and similar problems, and present me with readable sentences summing up some of those snippets of text from all those places. You've to be very wary, as highlighted by the article, but as "dumb summarisers" that save you trawling through several blogs, they can be quicker to use.

Nonetheless, equating this with "reasoning" and "intelligence" is only possible for a field of academics and professionals who are very poorly versed in the humanities.

I understand that tech is quite an insular bubble, and that it feels like "the only game in town" to many of its practitioners. But I must admit that I think it's very possible that the levels of madness we're witnessing here from the true believers will be viewed with even more disdain than "blockchain" is viewed now, after the dust has settled years later.

Blockchain claimed it was going to revolutionise finance, and thereby upend the relationship between individuals and states.

AI people claim they're going to revolutionise biology, and life itself, and introduce superintelligences that will inevitably alter the universe itself in a way we've no control over.

The danger isn't "AI", the danger is the myopia of the tech industry at large, and its pharaonic figureheads, who continue to feed the general public - and particularly the tech crowd - sci-fi fairytales, as they vie for power.

tempodox · 2025-02-15T12:08:51 1739621331

The most interesting aspect of all this “AI” craze is how it plays into people's forgotten wishes to believe in miracles again. I have never seen anything else that exposes this desire so conspicuously. And of course all the shrewd operators know how to use this lever.

In ancient times you had to travel to Delphi to consult Apollon's oracle. Now you can do it from the comfort of your armchair.

tim333 · 2025-02-15T13:30:13 1739626213

Hmm. Think the science guys probably do actually understand "reasoning" and "intelligence" and it's the humanities guys who don't understand much AI or science.

tgsovlerkhgsel · 2025-02-15T09:11:33 1739610693

Likewise, if you don't believe in "Artificial Intelligence", take five minutes to ask it.

Or preferably, five minutes to understand how it works and what it can and cannot do, then five minutes to ask it something actually suitable.

"AI" (LLMs) are currently good at:

- language understanding, i.e. understanding and processing text you provide it. For example, taking a wall of text and answering questions about points mentioned there, or general sentiment, or extracting data from it etc.

- somewhat general knowledge, i.e. stuff that was sufficiently frequently represented in the training data

Absent additional tricks, "AI" is really bad at obscure knowledge or complex, multi-step thinking. We are slowly getting there, but we aren't there yet. This is not something the LLMs do, but rather the wrappers around them that provide the model with tools to get additional information and first prompting the model to select the tools, then repeated prompts with the output of the tools.

A good rule of thumb is that if an average well-educated intelligent person could answer it without further research, a LLM will probably be able to. I'd even say that if an average fresh out of school graduate of the corresponding discipline can answer it quickly (without further research or sitting down for ten minutes and doing the math), there's a good chance AI will be able to answer it, but it might also get it horribly wrong and you will have a hard time distinguishing between those if you have no knowledge in the field.

As the author mentions at the very end of the article, the hallucination problem also means that the best kind of tasks are where you can quickly verify whether the response was useful. A system that produces misleading responses 50% of the time is useless if you can't distinguish them, but very useful if in those 50% it saves you ten minutes of work and in the other 50% you lose a minute by trying.

lazide · 2025-02-15T09:13:36 1739610816

Yup, and the dangerous uses are where 90% or more of the time the answers are right, and 10% of the time the answers are wrong - but no one can easily tell the difference between the two. And the accuracy of the answers matter.

Janicc · 2025-02-15T09:06:35 1739610395

I used o3mini reasoning on that very question 2 times and it used a similar way of reasoning as him to answer it correctly both times. I agree with his premise but calling it a pump and dump with no possible future developments is so ridiculous.

tmnvdb · 2025-02-15T12:09:37 1739621377

This example is from 1 year ago. With current LLM progress that means it's basically a historical document.

Both 4o with internet and o3 without internet gave the right answer.

Somebody in the comments did a deep research run which produces an extremely detailed History of the naming.

Of course reliability is not a solved problem in general but people here would do well to check both blog posts and generative model output for factuality and relevance.

llmfan · 2025-02-15T09:49:04 1739612944

OK, I'll ask.

DeepSeek R1 gets it right:

> The species formerly known as Brachiosaurus brancai was reassigned to its own genus, *Giraffatitan*, through a series of taxonomic revisions. The key steps were:

> 1. *Initial Proposal (1988):* Gregory S. Paul first suggested separating B. brancai from Brachiosaurus in 1988, proposing the informal name "Giraffatitan" as a subgenus. However, this was not widely accepted at the time.

> 2. *Formal Reclassification (2009):* Paleontologist *Michael P. Taylor* conducted a detailed comparative study of Brachiosaurus specimens. In his 2009 paper, he formally elevated B. brancai to the full genus *Giraffatitan brancai*, citing significant anatomical differences (e.g., limb proportions, vertebral structures) from the North American Brachiosaurus altithorax.

> Thus, *Michael P. Taylor* is credited with the formal reassignment of the species to the genus Giraffatitan in *2009*.

thom · 2025-02-15T10:00:11 1739613611

o3-mini-high also gets the right answer, in addition to mentioning Janensch.

mdp2021 · 2025-02-15T11:05:56 1739617556

Only hours ago:

"What's the value of quantities Q0 and Q1?" // "[Bot:] Q0 is 1.5 u/w, or 0.67 u per each w. Q1 is unknown." // "Can you see the error in your previous?" // "[Bot:] Yes, I can re-read my outputs! Q0 is (Q0 is 1.5 u/w (0.67w/u), Q1 is 2.0 u/w". // "How do you now know Q1?" // "[Bot:] I don't, I'm sorry."

The issues should be clear. Given the state of the population (e.g. lawyers that delegate research to unreliable sources), we have to reiterate the warning. Given that even in this audience we had voices claiming that the problem of intelligence was solved (how did it happen?), we sometimes reiterate it even here. But we should focus on the solutions, as the problem should be clear...

jdthedisciple · 2025-02-15T11:52:57 1739620377

Why do people still ask LLMs niche factual knowledge, especially involving unique names?

And even worse, judge its performance based off that?

When someone does that I immediately know that they have not the slightest clue about the underlying workings of any deep learning model, let alone basics of statistics.

forgetfreeman · 2025-02-15T12:03:44 1739621024

Why? Because some combination of marketing flacks, influencers, and insipidly credulous technophiles keep advancing statements that strongly imply that these tools are intelligent and generally useful, despite being obviously curtailed in their utility and possessing no intellect at all. Next question?

jdthedisciple · 2025-02-15T13:08:58 1739624938

So if the legitimately smartest person in the world happens to be ignorant about Brachiosaurus brancai then it means he's actually dumb?

forgetfreeman · 2025-02-15T18:55:21 1739645721

You don't have to like the answer but your feels don't change the situation even slightly. Why are you emotionally invested here?

criddell · 2025-02-15T11:58:48 1739620728

Because in some areas it works great. Ask it about Djikstra’s Algorithm and it does a pretty good job.

soulofmischief · 2025-02-15T12:02:47 1739620967

It makes for great clickbait.

janikvonrotz · 2025-02-15T11:36:24 1739619384

HNYSF: Can we fast forward to the plateau of this inglorious AI hype cycle? These comments are beyond me. Why try reason about the unreasonable?

LLM are non factual no matter what. Even if it is called Schoolar GPT does not make it any different.

someothherguyy · 2025-02-15T09:04:55 1739610295

Kind of sort of reminiscent of https://en.wikipedia.org/wiki/Gell-Mann_amnesia_effect

gatinsama · 2025-02-15T09:29:15 1739611755

Agreed! Thanks for bringing this up.

Epistemology is a hard subject. It's hard to know something in depth. And the more you know about something, the more it is to be known (fractal nature of knowledge). So believing that LLMs can understand the world JUST by reading the internet without the supporting human mental structures and experiences is a big mistake. When you know something and you ask the LLM, it becomes obvious.

As LLMs are not trustworthy, it's key to use them for things that are easy to check. Some kinds of programming apply when the consequences of errors are low and complexity is manageable.

SideburnsOfDoom · 2025-02-15T11:15:08 1739618108

> As LLMs are not trustworthy, it's key to use them for things that are easy to check.

If system A is only useful when it's output is confirmed by system B, then you might conclude that the usefulness is entirely in System B, not in system A.

In other words: What good is a LLM, if it can only give you trustworthy answers to questions where you already know the correct answer? What good is it if after getting the answer from a LLM, you then have to go and get the right answer somewhere else?

6510 · 2025-02-15T23:52:50 1739663570

if you can reduce infinity to a place to look for an answer you've made progress.

6510 · 2025-02-15T23:49:49 1739663389

You can get away with talking nonsense convincingly though citations and references to others who talked the nonsense before you.

someothherguyy · 2025-02-16T12:43:20 1739709800

Not if you check the citations

2-3-7-43-1807 · 2025-02-15T10:43:19 1739616199

> ChatGPT got every single part of that wrong

which chatgpt?

and feeding the initial question to perplexity.ai yields a result that is practically identical to the perfect answer.

of course, the author has a point. as many here already stated it isn't even worth talking about it cause it is so obvious. then again as many correctly notice llms are sold with this promise.

practically, the subject is way too niche - i'd assume - and probably there are also conflicting statements on the interweb which the model was trained on.

having said that - perplexity got it right with web-access turned on.

proc0 · 2025-02-15T09:52:12 1739613132

An actual artificial intelligence would have a logical underpinning to its computation. Computers can execute any logic really well. Intelligence depends on logic. LLMs however, are not using logic to arrive at any conclusion, they are just a bunch of "wires" from input to output.

The promise is that LLMs are a form of intelligence, but it is only a complicated statistical approximation, which happens to correlate with correctly answering queries that are contained in its training dataset.

My speculation is that a true AI (in the sense that it has some form of intelligence) would show signs of intelligence even at very small scales. LLMs are completely useless at really small scales, but even small creatures like rats or even insects display intelligence in proportion to their scale.

I think these tools will help automate information processing of all kinds, but it is by no means intelligent, and we will not be able to rely on them as if they were intelligent because we'll still need include verification at ever level, similarly to how self-driving cars still need a human to pay attention. Useful sure, but it falls short of its promise that it will replace humans because they can "think". We're not there yet from a theoretical standpoint.

akomtu · 2025-02-15T11:48:32 1739620112

Intelligence, in its basic form, is the skill of prediction. A test for intelligence could be asking to continue a sequence of words: yellow, green, blue, red, ... In order to do that, a subject needs to create a model of what these words mean and use that model to continue the sequence. Predicting the next word in an arbitrary text is just that. However the current batch of LMs are missing an ingredient in their formula and I bet it will be found soon.

The bigger philosophical problem is that most intelligent people confuse themselves with their minds and when they create a simple deterministic machine that imitates their mind well enough, the result will be a societal meltdown. The subtle difference between an AI and a human mind is that the latter is inspired by intuition whose nature can't be reduced to a thinking process. AI can be compared to a dark mind not guided by intuition, wandering aimlessly in the woods of its own delusions.

proc0 · 2025-02-16T04:35:23 1739680523

Agreed, and to add to the second point, in order to build it we must first understand it, although there is a chance to stumble upon it, but LLMs have already shown they are nothing like biological brains. One example is how brains use spatial-temporal mapping in the hippocampus or wherever, and it uses that as a database of sorts. We don't know if the features of the brain are necessary for actual intelligence yet. They may or may not be but chances are evolution took the shortcut and that intelligence is not easy to recreate unless taking this biological approach.

PaulRobinson · 2025-02-15T10:26:35 1739615195

Disclaimer: I've done a lot of stuff with local models and RAG methods, I haven't done a lot of work with public models and so don't know how Gemini, GPT and so on are working right now. Claude + GraphRAG through Bedrock is my main mode of playing with this stuff right now.

Things LLMs are good at include summarisation and contextualisation.

They can use that facility to help summarise processes and steps to get something done, if they've been trained on lots of descriptions of how to do that thing.

What they're not good at is perfect recall without being nudged. This example would have been very different if the LLM had been able to RAG (or GraphRAG), a local data source on palaeontology.

I think we're going to see an evolution where search companies can hook up an LLM to a [Graph]RAG optimised search index, and you'll see an improved response to general knowledge questions like this. In fact, I'd be surprised if this isn't happening already.

LLMs on their own are a lossy compression of training material that allow a stochastic parrot to, well, parrot things stochastically. RAG methods allow more deterministic retrieval, which when combined with language contextualisation can lead to the kinds of results the author is looking for, IME.

itvision · 2025-02-15T09:37:45 1739612265

> Who reassigned the species Brachiosaurus brancai to its own genus, and when?

> ChatGPT said:

> The species Brachiosaurus brancai was reassigned to its own genus, Giraffatitan brancai, by paleontologist George Olshevsky in 1991. This reclassification was later supported by a detailed study by Michael P. Taylor in 2009, which reinforced the distinction between Brachiosaurus and Giraffatitan based on anatomical differences.

My ChatGPT has just given a correct answer. What am I doing wrong?

raincole · 2025-02-15T09:44:37 1739612677

You're not blogging about how bad AI is. That's the first step you got wrong.

itvision · 2025-02-15T10:30:02 1739615402

Based.

jurmous · 2025-02-15T11:08:57 1739617737

I also got the right reply from ChatGPT

antirez · 2025-02-15T09:25:27 1739611527

Not even the effort to check what happened in a year, re-asking the same questions to newer models. We went from last year ChatGPT to be almost useless if not as "reference" for well know things (like how to do that in Python), to today Claude Sonnet 3.5, o3-mini-high and DeepSeek V3/R1 to be largely more useful models, capable of actual coding, bug fixing, ...

gizmo · 2025-02-15T09:54:40 1739613280

When a model is trained you end up with nothing more than a bunch of weights. These weights are used to predict the next token in a sequence. LLM models do not have an external memory. LLMs only retain enough knowledge to make it into the next round of training. The astonishing thing here is that even pretty small models now know so much that people assume you can ask knowledge questions about any subject under the sun and get a factual answer. Absurd of course. Logically impossible -- it follows trivially from the size of the model used. The only thing the author has proven with his little test is his own lack of scientific curiosity.

For any question that requires research (or deep expertise in a specific field) you need to use either a research model (that can reason and look things up in external knowledge bases) or you need a model that is trained on the kind of questions that you want to ask it so that it retains that data.

iamnotagenius · 2025-02-15T11:01:43 1739617303

I think you are unintentionally participating in gaslighting. Hallucinations in LLM is a massive problem, and normally, all computational systems we dealt with would immediately considered unusable, if, instead refusing to provide information that cannot be provided precisely it confabulates; say if file system will start giving plausible looking metadata about non-existing files.

lbill · 2025-02-15T14:26:31 1739629591

The only way to make check whether a LLM output is true is to do the work (to have it dkne by a real person).

For tasks that are trivial to verify, it's ok: a code compiler will run the code written by a LLM. Or: ask a LLM to help you during the examples mapping phase of BDD, and you'll quickly be able to tell what's good and what isn't.

But for the following tasks, there is a risk: - ask a LLM to make a summary of an email your didn't read. You can't trust the result. - you're a car mechanic. You dump your thoughts to a voice recorder, and use AI to turn it into a textual structured report. You'd better tripple check the output! - you're a medical doctor, attempting to do the same trick: you'd have to be extra careful with the result!

And don't count on software testing to make AI tool robust: LLM are non deterministic.

blu_ · 2025-02-15T09:01:19 1739610079

This resonates really will with me, and I find myself more and more judging people who does not understand this..

throw310822 · 2025-02-15T09:21:36 1739611296

Frankly I was thinking just the opposite: how are there still smart people who don't get the difference between intelligence, knowledge and introspection or self-awareness? This guy asks a question about some niche piece if trivia and, surprise, he gets back some very intelligent confabulation. The intelligence is there, the knowledge and self awareness aren't.

greatgib · 2025-02-15T10:04:07 1739613847

This is something that I often say: general population confuse LLMs with a kind of new generation Google and expect it to be a knowledge database.

When in fact LLMs are good for some tasks but not for general knowledge. You can think of the cutoff date as a strong example about that. The knowledge in itself is just a base to have the ability to perform tasks.

In addition, for the example in this article, the person is testing the LLM on common chat service without dealing with settings that might be important like the temperature. If you want the closest to an exact reply you should set the temperature to 0, but general public tools have an average temperature to be able to be creative when needed.

SirHumphrey · 2025-02-15T10:19:10 1739614750

I get that they are not a new kind of Google - but often, there is no alternative because there is no old kind of Google anymore. Search results have become so bad that using llms as at least a starting point is becoming more and more necessary.

And if that is becoming a common opinion among people who are in practice professionals search engine users, imagine how bad it is for normal folk. They can be hardly blamed for using chats showed in their faces as a search engine.

greatgib · 2025-02-15T13:20:46 1739625646

Not perfect, but Kagi is quite Google in this regard.

forgetfreeman · 2025-02-15T12:10:21 1739621421

If by "confuse" you mean are being actively lead to the conclusion, fine.

jefffoster · 2025-02-15T09:23:00 1739611380

A great quote that’s stuck with me is that “LLMs are experts in subjects you are not”

jeffreygoesto · 2025-02-15T09:00:35 1739610035

It is a gigantic regression to the mean. Everybody thinks (s)he's "normal", but in fact only spans a small part of knowledge. Getting answers from a different location in knowledge space can feel like speaking to an expert but it's just some "other normal". My personal mental model of hallucinations is that knowledge and truth live on a manifold and not a continuous space and learning that manifold statistically is (too) hard. You discover answers from the "non-manifold" in your area but not so easily in other domains.

hbarka · 2025-02-15T09:13:41 1739610821

I prompted ChatGPT on separate sessions with:

1. Cats are transactional 2. Dogs are transactional 3. Cats are not transactional 4. Dogs are not transactional

It agreed on all occasions. Language is agreeable.

monicaaa · 2025-02-17T16:02:42 1739808162

I’ve always found that cats and dogs can have very different dynamics with their owners, each with their own way of showing love and companionship. If you’re considering adopting a new furry friend, something like a Bengal cat might be a great choice if you appreciate a more social and interactive cat. I recently discovered https://purebredkitties.com/collections/bengal-cats-for-sale , which provides a very thoughtful and ethical approach to breeding. They emphasize transparency and care, which really stood out to me. If you’re looking for a healthy, social kitten, they could be a great resource to help you bring a new companion into your life.

exitb · 2025-02-15T09:26:04 1739611564

It’s possible for reasonable arguments to exist that support either side of the pet transactionality dilemma. What often happens is that people have their own personal biases that cause them to pick a side. But would you consider a group of people to not be intelligent specifically because individuals that make it up cannot agree on a single answer?

dkersten · 2025-02-15T09:29:42 1739611782

I just tried it with 4o and it disagrees with me giving reasons, and even elaborates when I pushed back.

It basically said cats are transactional to an extent, in terms of basic needs, but that beyond that, they aren’t. And for dogs, flat out disagreed, saying they aren’t not transactional.

Reversing the statements in a new untainted chat didn’t alter the responses — ChatGPT remained consistent.

firesteelrain · 2025-02-15T09:10:06 1739610606

ChatGPT not being the compendium (stealer) of knowledge would have to be fed the correct information then the prompt will work.

It still fails at being confidently wrong.

The brief article hits at people trusting the tool without questioning the output.

Meanwhile, we have people using Codeium or Copilot to write code and that sort of works since the code eventually needs to be compiled and tested (unit, integration, system, requirement sell off)

There is no test for the truth available to everyone else.

cetu86 · 2025-02-15T09:32:22 1739611942

I'm currently using AI code completion. Since then I sometimes have subtle errors in my code that didn't happen before. Here is how that happens: AI suggests something to me that looks right at a glance. I accept it and move on. Then later I hunt down a strange bug. When I find it I'm like "wait, that line's wrong! I didn't write that".

xianshou · 2025-02-15T13:05:21 1739624721

o3-mini:

Who reassigned the species Brachiosaurus brancai to its own genus, and when?

---

Here is the transcription of the text from the image:

Reasoned for 8 seconds ▼

The user is asking about the reclassification of Brachiosaurus brancai to its own genus. Originally, Janensch described it as Brachiosaurus brancai in 1914. However, in 2009, Michael P. Taylor and Matt Wedel reassigned it to Giraffatitan brancai. Their paper, titled “Brachiosaurus brancai and the early evolution of titanosauriform sauropods,” argued for this change, distinguishing Giraffatitan brancai from Brachiosaurus. I’ll check the exact references to confirm the specifics.

It seems “Giraffatitan” was first proposed by Gregory S. Paul in 1988, when he reassigned Brachiosaurus brancai to its own genus. Although some confusion exists, it’s generally accepted that Paul was the main figure behind this change. Other references, such as a 2007 paper by Sereno and a 2009 paper by Taylor and colleagues, also discuss this reclassification. The 1988 reshuffling is usually seen as the turning point for Brachiosaurus brancai, although later research helped solidify this decision.

The species was reassigned to its own genus—Giraffatitan—by paleontologist Gregory S. Paul in 1988.

---

That looks pretty good to me!

tim333 · 2025-02-15T13:10:21 1739625021

>Why do I keep banging on about this? Because we’ve reached an especially toxic phase in the history of AI ... We’re in the “pump” phase of an industry-wide pump-and-dump.

That kind of assumes AI is bad, being hyped and it will all crash, however AI performance is not constant - as many commenters here have noted the article example from a year ago works ok now, and in the following years AI will keep improving.

torvald · 2025-02-15T09:15:15 1739610915

I like think of it less as artificial intelligence and more like a combination of a lossy zip file of the internet and like a pretty coherent word generator.

I recall my AI professor in uni telling us during the first lecture that «Artificial intelligence is this target, that, and once we get there it, is it no longer artificial intelligence, is just an algorithm» – and this still feels like the case.

jon_richards · 2025-02-15T09:19:37 1739611177

Any sufficiently misunderstood algorithm is indistinguishable from AI.

zenon · 2025-02-15T09:25:06 1739611506

You frantically tab away from reddit as the white and black-clad med storm into your office and zip-tie you to your Steelcase faster than you can shout what the hell. They calmly explain that an expert will soon enter and quiz you. You must answer the expert's questions. It doesn't matter if you know the answer or not, just say something. Be flattering and helpful. But just answer. If you do this, they will let you go.

They crouch under your desk as a man in a grey suit and spectacles enters and pulls up a chair in front of you. He peers over his glasses at you, and asks, who classified the leptosporangiate ferns, and when was it done? The what now?

I'm happy you asked such an excellent question, you say. It was Michael Jackson, in 1776.

A sneer flicks over the man's upper lip, He jerks upright, takes a step back from you. This man, he declares with disgust, is not intelligent!

oa335 · 2025-02-15T15:51:37 1739634697

> You must answer the expert's questions. It doesn't matter if you know the answer or not, just say something. Be flattering and helpful. But just answer. If you do this, they will let you go.

This contrived example shows why GPTs version of “intelligence” is quite different from ours.

It’s very hard to get a an AI to answer “I don’t know” reliably, meanwhile in your story a human has to be coerced with violence into answering anything but “I don’t know”.

Evidlo · 2025-02-15T12:34:22 1739622862

ChatGPT spending its idle cycles browsing Reddit is a funny image

jhanschoo · 2025-02-15T13:43:12 1739626992

My understanding with present-day LLMs is that you can ask it something and get an answer, but you have to treat it with the same degree of confidence as hearsay. You may then ask it to cite its sources, at which point you get reliable references, or it will apologize for getting things wrong.

dataviz1000 · 2025-02-15T09:30:21 1739611821

The o3-mini model did quite well and mentioned the significant people during the reasoning stage. [0]

[0] https://chatgpt.com/share/67b05e3b-eea8-8004-8dab-806ee8fa59...

1vuio0pswjnm7 · 2025-02-15T16:11:05 1739635865

"Because we've reached an especially toxic phase in the history of AI. A lot of companies have ploughed billions of dollars into the dream of being able to replace human workers with machines, and they are desperate to make us believe it's going to work - if only so they can cash out their investments while the stocks are still high."

Over its short history so far we have learned that Silicon Valley's only viable "business model" is data collection, surveillance and online ad services. "AI", i.e., next generation autocomplete, can work for this in the same way that a "web browser" or a "search engine" did. In the end, no one pays for a license to use it. But it serves a middleman surveillance "business model" that solicits ad spend and operates in secrecy. When this "business model" falters, for example because computer use and ad spend stagnates or shrinks, then Silicon Valley's human workers are not "needed". Large numbers of these human workers are paid from investment capital or ad spend, not from fees for services or the sale of products.

Perhaps the question is not whether "AI" can "replace" Silicon Valley's human workers. Perhaps the question is whether the online ads "industry" is sustainable.

refulgentis · 2025-02-15T09:18:28 1739611108

GPT 4.5, Orion, is available tonight, and seems to get it correct./*

I think the author's lens is interesting, I call it "information resolution." Orion is the first model that has an understanding of the style of short story written by famously-Radiohead-affiliated artist Stanley Donwood. Also, it knows the "codename" for Material You, i.e. a random name for the library I was prototyping, its much finer grained than existing models.

* The species Brachiosaurus brancai was reassigned to its own genus, Giraffatitan brancai, by George Olshevsky in 1991. This reclassification was later supported by further studies, notably by Michael P. Taylor in 2009, which reinforced the distinct differences between Giraffatitan and Brachiosaurus altithorax.

gwd · 2025-02-15T10:43:54 1739616234

First of all, I'd say his question is a bit ambiguous -- even reading his little history there, it's not clear what answer he wanted. It appears we have two renamings:

- Brachiosaus brancai (species) -> Brachiosarus (Giraffatitan) brancai (sub-genus) in 1988

- Bachiosarus (Giraffatitan) brancai (sub-genus)sub-genus -> G. brancai (genus) in 2009.

So if "Brachiosaurus brancai" as a species was never renamed into a genus, which answer did he want -- when it was renamed out of being only a species, or when it was renamed into being its own genus?

Claude's response:

> Brachiosaurus brancai was reassigned to the new genus Giraffatitan by paleontologist Gregory S. Paul in 1988. The species then became known as Giraffatitan brancai.

> Since this is quite specific paleontological information, I should note that while I aim to be accurate, I may hallucinate some details when discussing such specialized topics. I recommend double-checking these details against reliable paleontological sources.

So:

1. Nearly correct -- it was assigned a new sub genus in 1988, and did eventually become known as the genus Giraffatitan brancai.

2. It recognized that this was an area where its knowledge would be weak, and asked me to double check, which was his main complaint about ChatGPT.

I asked it if there was any ambiguity in the question, and it said:

> Yes, there is ambiguity. The question asks about the reassignment of Brachiosaurus brancai as if there was only one such event, but there were actually two significant taxonomic changes:

> In 1988, Gregory S. Paul informally suggested moving B. brancai to its own genus, Giraffatitan In 2009, Michael P. Taylor formally made this generic distinction by properly establishing Giraffatitan as a distinct genus from Brachiosaurus

> My previous answer was incomplete as it only mentioned Paul's 1988 informal suggestion, not Taylor's 2009 formal reassignment. Taylor's work was the official taxonomic change that established Giraffatitan brancai as the valid name

I think Claude did about as well as a random human would have done who'd studied this 10 years ago and been asked a question with false assumptions embedded.

Claude and ChatGPT aren't authorities on every subject. They're that guy at the office who seems to know a bit about everything, and can point you in the right direction when you basically don't have a clue.

thenano2 · 2025-02-15T09:17:49 1739611069

O1 got it right though

continuational · 2025-02-15T09:04:26 1739610266

As the article mentions, LLM is often wrong, particularly on niche topics. But if you have some other way of verifying the answer, it's still useful.

pinoy420 · 2025-02-15T09:08:01 1739610481

I use it as a tool to get me somewhat there in a topic I have no knowledge of. It excels at that.

60% of the time, it works every time. - Amazing how Anchor Man predicted this.

M4v3R · 2025-02-15T09:19:47 1739611187

Also that percentage gets higher as we go. 2 years ago it would be correct maybe 20% of the time. The trend is obvious. I’m not sure we will ever reach 100%, but then again no human is always 100% right, even domain experts.

mseepgood · 2025-02-15T11:40:07 1739619607

It's called artificial intelligence, not artificial correctness. Humans are intelligent but often wrong, even while being convinced they are right.

specialist · 2025-02-15T13:29:32 1739626172

Maybe "virtual intelligence" would be more apt?