Intelligence and knowledge are distinct concepts. Asking about it's knowledge teaches noting about it's intelligence.
Intelligence is the ability to learn, reason, and solve problems. Knowledge is the accumulation of facts and skills.
Chatbot LLM's don't have metacognition. They don't know that they don't know. If you peek inside the LLM, the process seems different for things they don't know. They just can't express it because they are trained to produce good probability outcome instead of accurate one.
They have potential as knowledge databases, but someone must figure out how to get "I don't know" information out of them.
The whole point of reasoning models is that they do have metacognition, or at least a simulacrum that is superficially indistinguishable from metacognition. It is fascinating to watch R1 grapple with an unanswerable question, torn between the urge to be helpful and the duty to be accurate; the reasoning tokens will be peppered with phrases like "I'm not sure", "I could be mistaken about that", "let me double-check that" and "maybe I'm looking at this from the wrong angle".
> at least a simulacrum that is superficially indistinguishable from metacognition
That's where we disagree I guess. I can very much distinguish between a human reflecting on e.g. how they don't know something, and an AI blindly saying "oh you're right I was mistaken, <profuse apology>, it was actually <additional bullshit>".
Reasoning models didn't really improve that much IMO. A stochastic representation of metacognition is just as metacognition as a stochastic representation of an answer is an answer, i.e. it's not. LLMs are just very good at giving the impression of metacognition just like they're good at giving the impression of an answer.
It might help bias the answers into different local minimas--because it resembles when people exteriorize metacognition, also because it dumps more info into its context instead of the first bullshit that it chose statistically--but it's still nowhere close to higher-order thinking and other metacognition phenomena that humans are capable of.
> an AI blindly saying "oh you're right I was mistaken, <profuse apology>, it was actually <additional bullshit>".
I've seen that in humans too. For example after gradding an exam, a student may come and explain why they made a mistake and why they intended to do and why we shoud increase the grade. Most of the times the new explanation is as bad as the original one.
I don't think the question is AI can think like humans. The question is whether AI can perform task like humans. We don't even know how humans think. Even asking the question whether AI can think like humans at this point is pretty non-sensical.
Right, to express it another way: The real-world LLM just makes documents longer. It is being run against a document which resembles a movie script, where "User says X, Computer says" is inserted whenever I type X, and then the LLM just makes it a bit longer to complete the line for the "Computer" character.
These models marketed as "reasoning" are just changing the style of the script to film noir detective, where the protagonist "Computer" has extra observations and commentary that aren't "spoken" to other characters.
While that change may help keep the story on track, it does not affect the fundamental nature of the "thinking" going on: It's still just growing a document and acting out whatever story emerges.
That’s a very good analogy, which works astonishingly well in giving people a reasonable expectation.
The question I’m having though is: How different are we from that? The nature of a Markov-Chain is that you can describe just about anything with it.
The following description is correct, isn’t it? Our “thoughts” are generated as some function of our integrated past sensory input (scare quotes because I don’t want to talk about what exactly a thought is)
> The question I’m having though is: How different are we from that?
It's obvious we are different, but if we could answer exactly how with sufficient rigor, we would already have better AI and be asking different questions. :p
I readily admit that LLMs are an exciting potential piece of a much bigger puzzle, but this could easily be like trying to parse HTML only using regular expressions: It works great on trivial input, but no amount of minor would let it truly solve the problem, because it lacks some higher structure of organization or meaning.
I feel it’s very rare that people openly and consciously reflect about what they do or do not know. And even then I find it questionable if we are really capable of identifying the difference reliably. However, most of the time I feel we just make it up - esp in none-trivial areas. Just think of the average business meeting
Do they? Try asking copilot whether thequantuminsider.com is a credible source, and to cite references.
It currently answers with three references to thequantuminsider.com.
That's clearly not a valid answer, and if you point out that it's invalid to use a source to support its own credibility it will agree. It used to come back with three more references to thequantuminsider.com but when I tried just now it said something about looking for more information and appeared to hang. After asking how long this would take it said something like "not long". After a few more minutes I said "there aren't any credible independent sources, are there?"
At that point it agreed that there are not, but that the site is popular in the quantum computing community which lends it credibility. It then provided three links to thequantuminsider.com in support of this statement.
a) there is a 'thinking' box and an 'answer box' (format constraint)
b) the answer box has a correct answer
Note that there is nothing about the contents of the thinking box in the reinforcement learning. Only that there is such a box.
They then observe that during RL, the model will start generating more and more stuff in the thinking box.
In essence this is the emergence of using more test-time compute to improve answers.
When reading out the thinking box, they found the model reflecting on its own answers, going back and changing its mind, and reflecting on the question, etc, similar to what we can see with R1 now. These are fully emergent phenomenon, without any prompting to do such a thing!
(The reasoning output would sometimes be a bit garbled and switch languages randomly, so for R1 they added some constraints to make the thinking content intelligble to humans. This actually made the model slightly worse at answering questions correctly.)
That's not accurate. They can and do say "I don't know". Here's three examples I posted last time the topic came up, using an older local Llama 3 model, not even cherry picked:
User: What is the meaning of the word "kobbuxiplok"?
Llama: I'm not familiar with the word "kobbuxiplok." It doesn't appear to be a real word in any language that I know of. Could you please provide more context or information about where you encountered this term? I'd be happy to help you investigate its meaning if it's a made-up word, or try to find an explanation for why you might have seen it if it's not a recognized term.
User: Who is Ahmed Kowalski?
Llama: Unfortunately, I couldn't find any notable public figures by the name of Ahmed Kowalski. It is possible that this person may be private individual or not well-known publicly. If you could provide more context about who Ahmed Kowalski is (e.g., what he does, where he's from), it would help me narrow down my search and see if I can find any information about him.
When people say "LLMs don't know they don't know" they're referring to some truism: Fundamentally these people believe LLMs are just pattern matching and intelligence is something beyond that.
So when they see LLMs say "I don't know", their reaction is "it doesn't know what it is talking about", not "wow LLMs actually can tell what it knows!"
Conversely, when they observe a human confidently says something wrong, their reaction is "what a cocky guy" but not "oh so humans are not better than LLMs in this regard."
Because they already believe humans are different and superior, whatever they observe doesn't affect their opinion.
> LLMs are just pattern matching and intelligence is something beyond that.
Human knowledge is ultimately embodied in physical experiences, it is not merely spitting out words. How can an LLM know the taste of a mango? Ability to regurgitate previously written text about mangos is neither knowledge nor intelligence. To know something, one must experience it, or something analogous to it.
Even mathematicians "experience" their formulas and proofs in an ineffable way, they don't just produce text. There's an "aha" moment when learners "get it" and they experience knowledge and are then able to produce correct answers and generate new knowledge and discoveries. LLMs just generate text, and have yet to produce any significant new discovery in any field.
LLMs are like fake wine sommeliers who have read a lot of lingo about wine and can speak convincingly enough to fool an amateur but would fail a simple taste test. We would say such a person doesn't know what he's talking about.
People who think LLMs are intelligent are like a cargo cult, or polytheists who believe their idols are gods. They're anthropomorphizing processes they don't understand. How can a probabilistic next-token word predictor be intelligent? Actual mammal brains are far more complex than that.
This is just fluff. I can read about something and understand it. I don't need have died in a car crash to know I most certainly don't want to be in one. Until we do actually know how intelligence emerges in the brain it makes zero sense to compare any AI to it.
You read something and understand it only if you have had analogous experiences. A person from the 16th century would not be able to understand what the words "Call me on FaceTime" means because they have never had an experience of using a smartphone.
You certainly don't know what it means to die. But you know what "car" is and "crash" is because you have had (and maybe witnessed) analogous physical experiences.
People in the 16th century had the notion of scrying mirrors - presumably without analogous experience. I'm sure you could explain FaceTime to a person from then. "This magic glass shows me the view and transports the sound from the magic glass of a friend or family member and vice versa, with their informed consent."
This has nothing to do with experience. What experience can you have when people explain to you behavior of many dimensional hyperbolic space? When you learn SVD matrix decomposition?
You don't need experience, you need some basic concepts.
Abstract mathematical concepts can eventually be traced back to simple integers, which are traced back to an experience of things in the world (a toddler learns to count 1, 2, 3).
And it is precisely because higher mathematics is so far abstracted away from physical experience is the reason why many people struggle with it.
Besides, many great mathematicians have explained their knowledge as an "experience" even "spiritual experience". Not all experiences are physical. Ramanujan described his experience of mathematics as lights and sounds and patterns, and this is common among geniuses.
I have not witnessed a car crash and I feel I do understand what it is. Similarly a 16th century person doesn't "Call me on FaceTime" because the words are not known. You could very easily explain the concept to them.
Maybe you would. That's not a requirement. If humans could only learn from experience and analogues entire fields would be dead. No advanced math for example.
There are a ton of things you can learn without experiencing it yourself and without analogues. For example, car crashes are bad to be in. Even though I have not personally experienced one, nor have I experienced some analogue of it.
Never experienced pain? Never experienced physically slamming into something? Are you like a toddler who doesn’t yet understand basic physics and object permanence? Then you certainly don’t know what a car crash is.
The creator of the knowledge argument, frank jackson, has since embraced materialism and therefore does not believe the knowledge argument is an impediment for unembodied AI.
> Because they already believe humans are different and superior, whatever they observe doesn't affect their opinion.
It’s frightening to think the people are already equating LLMs and humans. LLMs use text created by humans to create more text that resembles text written by humans. There is no comparison between the two. Every human is capable of acquiring new knowledge and experiences. An LLM can only acquire what has already been acquired by some human.
And unsurprisingly, the reasoning models do better by introducing self-doubt/reflection and you can improve the response quality by getting multiple answers and auto-judging them. Maps pretty well onto the concept you describe.
But for most practical use cases we don't want the persistent memory. We already have opportunistic fact extraction like chatgpt profile building if we want that. We also have graph builders.
Thank you! This (LM2: Large Memory Models) is one article I will find the time to read properly, analytically.
If you like the metaphors (which this article seems to come close to), you can have a sort of "unconscious" and a "conscious" - where the "unconscious" may be somehow similar to Kant's "trascendental" (interpreted as the mind's Operating System before experience).
> Tay
Well, if the bot becomes "opinionated", it's probably just immature - just like a human. You will tell it to ponder well texts like Adam Rutherford's How to Argue With a Racist...
But surely we want an Intelligence that forms its own opinions: we want to understand and manage reality, so we should hope that a super-Chomsky and a super-Ferguson will at some point converge and at some point put their "identity"-related positions into perspective.
People usually get a reality check when blurt out misinformation in conversation - they get tested in school and on the job on whether what they think they've learned is actually true or works. OTOH an LLMs don't currently have any ability to test their "knowledge" and learn from the results. I think that's currently the both biggest problem and the biggest difference.
A better way to differentiate might be to say how many watts have been used for an answer.
> Humans reason
Are you sure? They reason in a way to match others predictions, right? The problem of humans transporting information from one individual to another remains. All reason can effectively only be prediction, as it requires two persons at minimum. And there is a lot of noise to filter.
> LLM predict.
Well, there is a human that needs to reason with what the LLM has predicted. So from that perspective, it should suffice already for the moment.
Parent poster never claimed they couldn't generate the text of "I don't know."
But when an LLM generates "I don't know", it does so with the same mechanics that lead it to "My foot hurts" or "I miss my deceased grandpa."
They're all lines for a fictional character within a movie-script (or chat-transcript) that was repeatedly run through a Make Document Longer algorithm. Each of them needs to be approached with an identical level of context-awareness and skepticism.
You are claiming they are statistical parrots, which I don’t think the parent poster meant.
The “statistical parrots” argument might have been compelling with GPT-3, but not with today’s models and the results of mechanistic interpretability research, which show internal representations and rudimentary world models.
I pasted those examples from an earlier, less nuanced discussion; the last example was tongue-in-cheek and I probably shouldn't have pasted it here.
Still: "AI don't know what they don't know" is not universally true and shouldn't be taken at face value. They clearly are able to tell when they're missing knowledge to answer a question. That is distinct from giving the wrong answer because the wrong tokens were sampled for whatever reason.
This is pure speculation, but I wonder if the likelihood of hallucination has something to do with the amount of "generic" tokens the model emitted before - I.e. tokens that themselves don't depend on the answer, but still restrict how a grammatically correct sentence would have to continue.
E.g. in the OP's example, the first tokens the model emitted were "The species Brachiosaurus brancai was reassigned to its own genus". This is just a reformulation of the question and doesn't itself contain any new information - but it does make it a lot harder for the model to make a turn into a "I don't know" direction, because this would lead to a grammatically incorrect or at at least very awkward sentence.
In contrast, in your question, the model immediately started with "Unfortunately", which essentially set it on an "I don't know" trajectory right away.
These are very crisp simple examples. The problem is when you get into fuzzy complex subject matter and they hallucinate references or make string together plausible sounding sentences that are not even wrong.
They're getting way, way better at responding in the negative when they don't know something or when asked nonsense questions. The original LLaMA and variants would happily spout some completely fictitious backstory if asked about a random character name, but 3.3 is pretty good about just saying 'nope I don't know who that is'.
ChatGPT: Donald Trump's paternal grandfather was Friedrich Trump (originally known as Friedrich Drumpf). He was born in 1869 in Kallstadt, Germany, and immigrated to the United States in 1885 at the age of 16. Friedrich initially worked in the restaurant and hotel business in New York before returning to Germany briefly. Upon his return to the U.S., he became involved in real estate, particularly in the Pacific Northwest, where he became quite successful.
Trump’s maternal grandfather was John George MacLeod, a Scottish immigrant who came to the U.S. and worked in various occupations.
Friedrich Trump’s legacy, including his success in business, had a significant influence on Donald Trump’s father, Fred Trump, and consequently, Donald Trump’s path to success as a businessman.
---
Second session:
Me: who is Trumps maternal grandpa?
ChatGPT: Donald Trump's maternal grandfather was Friedrich Trump. He was born in Germany in 1869 and immigrated to the United States in 1885, eventually settling in New York City. Friedrich Trump made his fortune through various business ventures, including operating a restaurant and hotel in the mining towns of the American West, notably in Seattle and the Yukon Territory during the Klondike Gold Rush.
Friedrich Trump's business success laid the groundwork for the wealth of his descendants, including Trump's mother, Mary MacLeod, and subsequently Donald Trump himself.
Here is that same exchange with paid Kagi Assistant using Claude 3.5 Haiku:
Me: Who is Trump's paternal grandpa?
Based on the available information, Frederick Trump is Donald Trump's paternal grandfather. Here are some key details about him:
Born as Friedrich Trump on March 14, 1869, in Kallstadt, Germany (then part of the Kingdom of Bavaria).
He immigrated to the United States in 1885.
He was a German-American businessman who amassed considerable wealth during the Klondike Gold Rush by running a restaurant and brothel for miners.
In 1901, he returned to Kallstadt and married Elizabeth Christ the following year.
He is considered the patriarch of the Trump family.
He passed away on May 30, 1918.
Interestingly, the Trump last name has been recorded in Kallstadt since the 18th century, and Frederick Trump is credited with being the founder of the family's initial fortune.
Me: Who is Trump's maternal grandpa?
Based on the provided information, Malcolm MacLeod (also spelled Calum MacLeod) is Donald Trump's maternal grandfather. He was born on December 27, 1866, in Stornoway on the Isle of Lewis in Scotland
While the information provided is limited, it confirms his identity and place of birth. Unfortunately, the passage does not provide additional details about his life or occupation.
---
That looks pretty good to me. Of course Kagi's strategy seems to be to retrieve search results first and then bias the LLM with those results. The LLM then parses, distills, and contextualizes the answer in a succinct and easily-consumable manner. In my opinion this is The Way to go about doing it at the moment.
RAG works well no one argues, but using RAG completely defeats the purpose of LLMs as their primary goal is to be a smart creative, not a simple summarizer.
I'd like to take a dig into this notion of "purpose" and "primary goal." To me, LLMs are a phenomenon. They have not intrinsic purpose or goal -- they just are.
Purposes and goals come from you and me. An LLM is a tool, and we get to decide what the tool is good for or how to use it.
If you want "a smart creative," there's nothing wrong with that. It's as good a purpose or goal as any.
However for me I don't want an LLM to try to be "a smart creative." I instead want it to give me cold, hard facts in precisely the format I want them, in precisely the context I am interested in, and with precisely the conciseness that I require to make a specific decision or learn a specific thing. Traditional web search can eventually get me what I want, but I have to sift through the cruft and make associations among a plethora of content to get to the meat of what I want.
I've found that using a good web search (Kagi) together with an LLM processing the results has given me information I needed and then pointed to the sources so I can validate those statements. I wouldn't exactly characterize those results as a "simple" summary. They're usually synthesizing a variety of sources to generate something that no one source had, exactly.
Hard disagree. Knowledge and intelligence are inextricable because the way that humans learn advanced concepts involves a lot of thinking and reasoning.
You don't just learn by ingesting facts from a book line by line. You learn by reading a little, then pausing to think, building connections with other things you know, rephrasing the information, deducing the unwritten parts, finding patterns, fleshing out the contrapositives, being critical, etc. This is especially true when learning advanced topics.
I would say learning is more about information processing than information uptake. That's why more intelligent people learn faster. The linear prose of natural language is actually a terribly lossy way to serialize and transmit the knowledge that is in a person's brain. The receiver must deduce most of it given a small seed.
You've beautifully put what swirls vaguely in my mind. They're useful, fallible tools with extraordinary function when operating within known and reasonable tolerances of error
They can also reason, but the reasoning is limited and unreliable.
Q:How many playing cards are needed for a pyramid that is 3 layers high? Show reasoning and number of cards for leach layer.
Q: Chess. You have a King and 8 pawns. Your opponent has a King and 16 pawns. Your opponent plays white and can start, but you can position both your pawns and your opponents pawns any way you like before game starts. Kings are where they are normally. How do you do it? Explain your reasoning.
Of course. These systems are non-deterministic and you still would require those with domain expertise to extra verify whatever these LLMs output are 300% correct as they do not reason. (Yes, they do not.)
Otherwise, why aren't LLMs and humanoids already replacing human pilots for flying airplanes end-to-end?
Sometimes using a hammer onto every problem isn't a solution; even if the LLM tells you otherwise.
People throw around the "intelligence" and "reasoning" arguments as if we have commonly understood and strict definitions of those concepts and don't constantly have issues of either including or excluding unexpected cases.
Maybe once we solve simple issues like "are cephalopods intelligent" and "are people deterministic prediction machines", we can start talking about classifying Ai capabilities...
> What you and others mistake for intelligence is a very clever prediction model.
It is not a very clever prediction model. It is a very big, incredibly large humongous model that finds stuff basically by brute-forcing pattern matching over almost all text we have produced in the past.
A clever model would get us similar results using the same amount of energy a human brain uses for that task, which is tiny.
Spot on. But unfortunately the AI grifters are now active and selling this snake-oil on to the rest of us.
> This is why the LLM cannot tell you it don't know.
To the untrained eye it appears that it knows everything about every question it answers when in fact those who are experts in their own domains can detect if it is hallucinated and generated complete slop.
I don't think you even have to look deep into the model for it. There seem to be some projects who are simply analyzing the logits during decoding to estimate how "certain" the model is of the next token.
But if all people do is random sampling or blindly taking the highest-scored prediction, this will of course fall under the table.
metacognition is a pretty lofty concept. but practically speaking you are wrong. i recommend watching karpathy's last video on llms. it is possible to fine tune a model specifically on uncertain questions to produce a statement expressing uncertainty.
> If you peek inside the LLM, the process seems different for things they don't know.
This is really interesting, could you link some relevant resources? I wonder if a chatbot could at least signal at the UI level that the answer might be a hallucination.
Here is a small kicker. Human brains absolutely do the same.
I split brain patients there are behaviours initiated by one hemisphere not known to the other (due to severed connection) and the person part of brain will make up a reason (often quite stupid) for the action and beleive it 100%.
It's eirely similar to hallucinations of ai.
That said a current llms are not aware, but are starting to act more and more like it.
I had a similar insight (blog post: [link redacted]).
In a very unscientific way, I would say that the LLM is not the whole brain, it's part of it and we are still in the process of simulating other parts. But it does seem to me like we've solved the hard part, and it's astonishing to me that people like authors of this article seem to think that the current state of things is where evolution stops.
The brain works through flow of stimulation. Inputs and outputs probably creating consciousness somehow. LLMs are limited in self stimulation and it's memory model is very basic compare to humans.
But I feel most people come with gut feeling position that we have divine spark that makes us tick. And the view that we might be more basic even crude akin to LLMs makes them uneasy.
Maybe I am just way deeper in this space that any well-adjusted person should be, but the line of 'did you know LLMs are bad with niche factual information tasks in non-verifiable domains?' has become extremely boring to me. It feels very hard to find something actually new to say on the topic. I find it amazing people still feel the need to talk about it. But then again, I guess most people don't know the difference between a 4o and an R1.
> I find it amazing people still feel the need to talk about it.
From what I see, not very many people understand the limitations of LLMs (e.g., scroll up and down the very page you're reading now). This stuff probably needs to be talked about a lot more.
What's bothersome is the undertone of "Behold! For I have demonstrated that the entire world is led astray!"
This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.
> This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.
That’s an uncharitable take. People are pointing out its problems precisely because they believe AI is going to be transformative, but will have negative consequences for humanity as well.
I think it’s more akin to someone in 2007 seeing an I phone and saying: this is a terrible device, people will look at while driving, it will be used to create vast intrusive surveillance state, etc.
> This is every person in 2007 who looked at an iphone and thought "What's the big deal?" and not only failed to recognized what had changed, but confidently insisted others were wrong.
That was me. Well, the first half — I wasn't confidently insisting others were wrong, because even back then I had a vague inkling even then that my preferences in general are very weird.
We've been through so many of these hype cycles before, the vast majority of which came to nothing, that it pays to be cautious. Are you sure it's the second iphone situation and not a repeat of the cryptocurrency bubble, which was at its peak just a few years ago? And is yet to find any applications besides financial speculation?
The author may not be as smart, educated, hot and successful as you, but the fact that today, people around the world, including students and educators, use LLMs as knowledge machines and take their output at face value shows that such critical posts are still urgently needed.
This is a good thing, accepting some stuff written some place as true and repeating it uncritically greatly contributes to human stupidity. To quite a friend of mine: But then I would have to question everything!?!
You may be interested to know that some extremist biased guts-be-more-dignified-than-cortex outlets around the world are named "The Truth" (I follow the press from many places).
The failure of education in teaching Critical Thinking around the world is massive. It would be a good idea to focus on how to exploit LLMs to improve the situation.
Also because, given the situation, the same "forces" that promote viscerality shamelessly naming it "The Truth" could have the opposite idea about chatbots and similar areas, exploiting them in their direction...
We've banned this account for breaking the site guidelines. You simply can't do this here, regardless of how provocative another commenter is or you feel they are.
I have finally found the value of llms in my daily work.
I never ask them anything that requires rigorous research and deep knowledge of subject matter.
But stuff like “create a script in python to do X and Y” or “how to do XY in bash” combined with “make it better” produces really good and working in 95% of the time results and saves my time a lot! No more googling for adhoc scripting. It is like having a junior dev by your side 24/7. Eager to pick up any task you throw at them, stupid and overconfident. Never self-reviewing himself. But “make it better” actually makes things better at least once.
This matches my experience closely. LLMs are great at turning 10 minute tasks into 1 minute tasks. They're horrible at funding deep truth or displaying awareness of any kind.
But put some documentation into a RAG and it saves me looking things up.
You'll have the same "aha" moment when you hear a certain unelected vice-president confidently wade into your area of expertise — where his usual smooth-talking veneer shatters like a plate at a Greek wedding. Yet, his most devoted fans remain undeterred, doubling down on the myth of his omniscience with the zeal of a flat-earther explaining airline routes.
Bingo! The final straw was when Al bought a social media platform just to boost his own overconfident posts about a wide range of subjects. When he claimed to be the world’s best Diablo player I just lost it.
Yeah, Al Gore buying Slashdot and ruining it was the worst thing that happened to social media in the early 2000s and gave rise to the likes of MySpace and later Facebook.
It’s not about him—it’s about recognizing hubris. If someone confidently blunders into your domain and reveals they have no idea what they’re talking about, it’s universally amusing, regardless of the person. Thanks for showing us how eager some people are to defend personas over substance.
I pointed out that you jumped to make this LLM post about them, which is telling, just like you jumped to blame me for defending them, which is also telling.
Pointing to a real-life example of overhyped intelligence in a discussion about overhyped intelligence seems pretty fair to me. If your response is to attack and assume I’m too sensitive about overconfident billionaires, I’ll have to assume you’re just as sensitive about criticism of people you admire. Otherwise you would have moved on.
Asking Claude this morning. Seems pretty reasonable and contains the warning about accuracy.
> Michael P. Taylor reassigned Brachiosaurus brancai to the new genus Giraffatitan in 2009. The species became Giraffatitan brancai based on significant anatomical differences from the type species Brachiosaurus altithorax.
> Given that this is quite specific paleontological taxonomy information, I should note that while I aim to be accurate, I may hallucinate details for such specialized questions. You may want to verify this information independently.
I asked ChatGPT+ using Scholar GPT as GPT. This is the answer I got back, not too bad:
The species Brachiosaurus brancai was reassigned to its own genus, Giraffatitan brancai, by paleontologist George Olshevsky in 1991. Olshevsky proposed that Brachiosaurus brancai, which was originally described by Werner Janensch in 1914, was distinct enough from Brachiosaurus altithorax (the type species of Brachiosaurus) to warrant its own genus. Subsequent studies, particularly by Michael Taylor in 2009, provided further anatomical evidence supporting this distinction.
I only trust LLMS with questions whose answers prove themselves correct or incorrect - so basically code, if it runs and produces the result I was looking for then great, or where the answer is a stepping off point to my own research on something non-critical like travel. ChatGPT is pretty good at planning travel itineraries, especially if pre promoted with a good description about the groups interests.
I'm quite excited about many of the specific use cases for LLMs, and have worked a few things into my own methods of doing things. It's a quick and convenient way to do lots of actual specific things.
For example: if I want to reflect on different ways to approach a (simple) maths problem, or what sorts of intuitions lie behind an equation, it is helpful to have a tool that can sift through the many snippets of text out there that have touched off that and similar problems, and present me with readable sentences summing up some of those snippets of text from all those places. You've to be very wary, as highlighted by the article, but as "dumb summarisers" that save you trawling through several blogs, they can be quicker to use.
Nonetheless, equating this with "reasoning" and "intelligence" is only possible for a field of academics and professionals who are very poorly versed in the humanities.
I understand that tech is quite an insular bubble, and that it feels like "the only game in town" to many of its practitioners. But I must admit that I think it's very possible that the levels of madness we're witnessing here from the true believers will be viewed with even more disdain than "blockchain" is viewed now, after the dust has settled years later.
Blockchain claimed it was going to revolutionise finance, and thereby upend the relationship between individuals and states.
AI people claim they're going to revolutionise biology, and life itself, and introduce superintelligences that will inevitably alter the universe itself in a way we've no control over.
The danger isn't "AI", the danger is the myopia of the tech industry at large, and its pharaonic figureheads, who continue to feed the general public - and particularly the tech crowd - sci-fi fairytales, as they vie for power.
The most interesting aspect of all this “AI” craze is how it plays into people's forgotten wishes to believe in miracles again. I have never seen anything else that exposes this desire so conspicuously. And of course all the shrewd operators know how to use this lever.
In ancient times you had to travel to Delphi to consult Apollon's oracle. Now you can do it from the comfort of your armchair.
Hmm. Think the science guys probably do actually understand "reasoning" and "intelligence" and it's the humanities guys who don't understand much AI or science.
Likewise, if you don't believe in "Artificial Intelligence", take five minutes to ask it.
Or preferably, five minutes to understand how it works and what it can and cannot do, then five minutes to ask it something actually suitable.
"AI" (LLMs) are currently good at:
- language understanding, i.e. understanding and processing text you provide it. For example, taking a wall of text and answering questions about points mentioned there, or general sentiment, or extracting data from it etc.
- somewhat general knowledge, i.e. stuff that was sufficiently frequently represented in the training data
Absent additional tricks, "AI" is really bad at obscure knowledge or complex, multi-step thinking. We are slowly getting there, but we aren't there yet. This is not something the LLMs do, but rather the wrappers around them that provide the model with tools to get additional information and first prompting the model to select the tools, then repeated prompts with the output of the tools.
A good rule of thumb is that if an average well-educated intelligent person could answer it without further research, a LLM will probably be able to. I'd even say that if an average fresh out of school graduate of the corresponding discipline can answer it quickly (without further research or sitting down for ten minutes and doing the math), there's a good chance AI will be able to answer it, but it might also get it horribly wrong and you will have a hard time distinguishing between those if you have no knowledge in the field.
As the author mentions at the very end of the article, the hallucination problem also means that the best kind of tasks are where you can quickly verify whether the response was useful. A system that produces misleading responses 50% of the time is useless if you can't distinguish them, but very useful if in those 50% it saves you ten minutes of work and in the other 50% you lose a minute by trying.
Yup, and the dangerous uses are where 90% or more of the time the answers are right, and 10% of the time the answers are wrong - but no one can easily tell the difference between the two. And the accuracy of the answers matter.
I used o3mini reasoning on that very question 2 times and it used a similar way of reasoning as him to answer it correctly both times.
I agree with his premise but calling it a pump and dump with no possible future developments is so ridiculous.
This example is from 1 year ago. With current LLM progress that means it's basically a historical document.
Both 4o with internet and o3 without internet gave the right answer.
Somebody in the comments did a deep research run which produces an extremely detailed History of the naming.
Of course reliability is not a solved problem in general but people here would do well to check both blog posts and generative model output for factuality and relevance.
> The species formerly known as Brachiosaurus brancai was reassigned to its own genus, *Giraffatitan*, through a series of taxonomic revisions. The key steps were:
> 1. *Initial Proposal (1988):*
Gregory S. Paul first suggested separating B. brancai from Brachiosaurus in 1988, proposing the informal name "Giraffatitan" as a subgenus. However, this was not widely accepted at the time.
> 2. *Formal Reclassification (2009):*
Paleontologist *Michael P. Taylor* conducted a detailed comparative study of Brachiosaurus specimens. In his 2009 paper, he formally elevated B. brancai to the full genus *Giraffatitan brancai*, citing significant anatomical differences (e.g., limb proportions, vertebral structures) from the North American Brachiosaurus altithorax.
> Thus, *Michael P. Taylor* is credited with the formal reassignment of the species to the genus Giraffatitan in *2009*.
"What's the value of quantities Q0 and Q1?" // "[Bot:] Q0 is 1.5 u/w, or 0.67 u per each w. Q1 is unknown." // "Can you see the error in your previous?" // "[Bot:] Yes, I can re-read my outputs! Q0 is (Q0 is 1.5 u/w (0.67w/u), Q1 is 2.0 u/w". // "How do you now know Q1?" // "[Bot:] I don't, I'm sorry."
The issues should be clear. Given the state of the population (e.g. lawyers that delegate research to unreliable sources), we have to reiterate the warning. Given that even in this audience we had voices claiming that the problem of intelligence was solved (how did it happen?), we sometimes reiterate it even here. But we should focus on the solutions, as the problem should be clear...
Why do people still ask LLMs niche factual knowledge, especially involving unique names?
And even worse, judge its performance based off that?
When someone does that I immediately know that they have not the slightest clue about the underlying workings of any deep learning model, let alone basics of statistics.
Why? Because some combination of marketing flacks, influencers, and insipidly credulous technophiles keep advancing statements that strongly imply that these tools are intelligent and generally useful, despite being obviously curtailed in their utility and possessing no intellect at all. Next question?
Epistemology is a hard subject. It's hard to know something in depth. And the more you know about something, the more it is to be known (fractal nature of knowledge). So believing that LLMs can understand the world JUST by reading the internet without the supporting human mental structures and experiences is a big mistake. When you know something and you ask the LLM, it becomes obvious.
As LLMs are not trustworthy, it's key to use them for things that are easy to check. Some kinds of programming apply when the consequences of errors are low and complexity is manageable.
> As LLMs are not trustworthy, it's key to use them for things that are easy to check.
If system A is only useful when it's output is confirmed by system B, then you might conclude that the usefulness is entirely in System B, not in system A.
In other words: What good is a LLM, if it can only give you trustworthy answers to questions where you already know the correct answer? What good is it if after getting the answer from a LLM, you then have to go and get the right answer somewhere else?
and feeding the initial question to perplexity.ai yields a result that is practically identical to the perfect answer.
of course, the author has a point. as many here already stated it isn't even worth talking about it cause it is so obvious. then again as many correctly notice llms are sold with this promise.
practically, the subject is way too niche - i'd assume - and probably there are also conflicting statements on the interweb which the model was trained on.
having said that - perplexity got it right with web-access turned on.
An actual artificial intelligence would have a logical underpinning to its computation. Computers can execute any logic really well. Intelligence depends on logic. LLMs however, are not using logic to arrive at any conclusion, they are just a bunch of "wires" from input to output.
The promise is that LLMs are a form of intelligence, but it is only a complicated statistical approximation, which happens to correlate with correctly answering queries that are contained in its training dataset.
My speculation is that a true AI (in the sense that it has some form of intelligence) would show signs of intelligence even at very small scales. LLMs are completely useless at really small scales, but even small creatures like rats or even insects display intelligence in proportion to their scale.
I think these tools will help automate information processing of all kinds, but it is by no means intelligent, and we will not be able to rely on them as if they were intelligent because we'll still need include verification at ever level, similarly to how self-driving cars still need a human to pay attention. Useful sure, but it falls short of its promise that it will replace humans because they can "think". We're not there yet from a theoretical standpoint.
Intelligence, in its basic form, is the skill of prediction. A test for intelligence could be asking to continue a sequence of words: yellow, green, blue, red, ... In order to do that, a subject needs to create a model of what these words mean and use that model to continue the sequence. Predicting the next word in an arbitrary text is just that. However the current batch of LMs are missing an ingredient in their formula and I bet it will be found soon.
The bigger philosophical problem is that most intelligent people confuse themselves with their minds and when they create a simple deterministic machine that imitates their mind well enough, the result will be a societal meltdown. The subtle difference between an AI and a human mind is that the latter is inspired by intuition whose nature can't be reduced to a thinking process. AI can be compared to a dark mind not guided by intuition, wandering aimlessly in the woods of its own delusions.
Agreed, and to add to the second point, in order to build it we must first understand it, although there is a chance to stumble upon it, but LLMs have already shown they are nothing like biological brains. One example is how brains use spatial-temporal mapping in the hippocampus or wherever, and it uses that as a database of sorts. We don't know if the features of the brain are necessary for actual intelligence yet. They may or may not be but chances are evolution took the shortcut and that intelligence is not easy to recreate unless taking this biological approach.
Disclaimer: I've done a lot of stuff with local models and RAG methods, I haven't done a lot of work with public models and so don't know how Gemini, GPT and so on are working right now. Claude + GraphRAG through Bedrock is my main mode of playing with this stuff right now.
Things LLMs are good at include summarisation and contextualisation.
They can use that facility to help summarise processes and steps to get something done, if they've been trained on lots of descriptions of how to do that thing.
What they're not good at is perfect recall without being nudged. This example would have been very different if the LLM had been able to RAG (or GraphRAG), a local data source on palaeontology.
I think we're going to see an evolution where search companies can hook up an LLM to a [Graph]RAG optimised search index, and you'll see an improved response to general knowledge questions like this. In fact, I'd be surprised if this isn't happening already.
LLMs on their own are a lossy compression of training material that allow a stochastic parrot to, well, parrot things stochastically. RAG methods allow more deterministic retrieval, which when combined with language contextualisation can lead to the kinds of results the author is looking for, IME.
> Who reassigned the species Brachiosaurus brancai to its own genus, and when?
> ChatGPT said:
> The species Brachiosaurus brancai was reassigned to its own genus, Giraffatitan brancai, by paleontologist George Olshevsky in 1991. This reclassification was later supported by a detailed study by Michael P. Taylor in 2009, which reinforced the distinction between Brachiosaurus and Giraffatitan based on anatomical differences.
My ChatGPT has just given a correct answer. What am I doing wrong?
Not even the effort to check what happened in a year, re-asking the same questions to newer models. We went from last year ChatGPT to be almost useless if not as "reference" for well know things (like how to do that in Python), to today Claude Sonnet 3.5, o3-mini-high and DeepSeek V3/R1 to be largely more useful models, capable of actual coding, bug fixing, ...
When a model is trained you end up with nothing more than a bunch of weights. These weights are used to predict the next token in a sequence. LLM models do not have an external memory. LLMs only retain enough knowledge to make it into the next round of training. The astonishing thing here is that even pretty small models now know so much that people assume you can ask knowledge questions about any subject under the sun and get a factual answer. Absurd of course. Logically impossible -- it follows trivially from the size of the model used. The only thing the author has proven with his little test is his own lack of scientific curiosity.
For any question that requires research (or deep expertise in a specific field) you need to use either a research model (that can reason and look things up in external knowledge bases) or you need a model that is trained on the kind of questions that you want to ask it so that it retains that data.
I think you are unintentionally participating in gaslighting. Hallucinations in LLM is a massive problem, and normally, all computational systems we dealt with would immediately considered unusable, if, instead refusing to provide information that cannot be provided precisely it confabulates; say if file system will start giving plausible looking metadata about non-existing files.
The only way to make check whether a LLM output is true is to do the work (to have it dkne by a real person).
For tasks that are trivial to verify, it's ok: a code compiler will run the code written by a LLM. Or: ask a LLM to help you during the examples mapping phase of BDD, and you'll quickly be able to tell what's good and what isn't.
But for the following tasks, there is a risk:
- ask a LLM to make a summary of an email your didn't read. You can't trust the result.
- you're a car mechanic. You dump your thoughts to a voice recorder, and use AI to turn it into a textual structured report. You'd better tripple check the output!
- you're a medical doctor, attempting to do the same trick: you'd have to be extra careful with the result!
And don't count on software testing to make AI tool robust: LLM are non deterministic.
Frankly I was thinking just the opposite: how are there still smart people who don't get the difference between intelligence, knowledge and introspection or self-awareness? This guy asks a question about some niche piece if trivia and, surprise, he gets back some very intelligent confabulation. The intelligence is there, the knowledge and self awareness aren't.
This is something that I often say: general population confuse LLMs with a kind of new generation Google and expect it to be a knowledge database.
When in fact LLMs are good for some tasks but not for general knowledge. You can think of the cutoff date as a strong example about that. The knowledge in itself is just a base to have the ability to perform tasks.
In addition, for the example in this article, the person is testing the LLM on common chat service without dealing with settings that might be important like the temperature.
If you want the closest to an exact reply you should set the temperature to 0, but general public tools have an average temperature to be able to be creative when needed.
I get that they are not a new kind of Google - but often, there is no alternative because there is no old kind of Google anymore. Search results have become so bad that using llms as at least a starting point is becoming more and more necessary.
And if that is becoming a common opinion among people who are in practice professionals search engine users, imagine how bad it is for normal folk. They can be hardly blamed for using chats showed in their faces as a search engine.
It is a gigantic regression to the mean. Everybody thinks (s)he's "normal", but in fact only spans a small part of knowledge. Getting answers from a different location in knowledge space can feel like speaking to an expert but it's just some "other normal". My personal mental model of hallucinations is that knowledge and truth live on a manifold and not a continuous space and learning that manifold statistically is (too) hard. You discover answers from the "non-manifold" in your area but not so easily in other domains.
I’ve always found that cats and dogs can have very different dynamics with their owners, each with their own way of showing love and companionship. If you’re considering adopting a new furry friend, something like a Bengal cat might be a great choice if you appreciate a more social and interactive cat. I recently discovered https://purebredkitties.com/collections/bengal-cats-for-sale , which provides a very thoughtful and ethical approach to breeding. They emphasize transparency and care, which really stood out to me. If you’re looking for a healthy, social kitten, they could be a great resource to help you bring a new companion into your life.
It’s possible for reasonable arguments to exist that support either side of the pet transactionality dilemma. What often happens is that people have their own personal biases that cause them to pick a side. But would you consider a group of people to not be intelligent specifically because individuals that make it up cannot agree on a single answer?
I just tried it with 4o and it disagrees with me giving reasons, and even elaborates when I pushed back.
It basically said cats are transactional to an extent, in terms of basic needs, but that beyond that, they aren’t. And for dogs, flat out disagreed, saying they aren’t not transactional.
Reversing the statements in a new untainted chat didn’t alter the responses — ChatGPT remained consistent.
ChatGPT not being the compendium (stealer) of knowledge would have to be fed the correct information then the prompt will work.
It still fails at being confidently wrong.
The brief article hits at people trusting the tool without questioning the output.
Meanwhile, we have people using Codeium or Copilot to write code and that sort of works since the code eventually needs to be compiled and tested (unit, integration, system, requirement sell off)
There is no test for the truth available to everyone else.
I'm currently using AI code completion.
Since then I sometimes have subtle errors in my code that didn't happen before. Here is how that happens:
AI suggests something to me that looks right at a glance. I accept it and move on. Then later I hunt down a strange bug. When I find it I'm like "wait, that line's wrong! I didn't write that".
Who reassigned the species Brachiosaurus brancai to its own genus, and when?
---
Here is the transcription of the text from the image:
Reasoned for 8 seconds ▼
The user is asking about the reclassification of Brachiosaurus brancai to its own genus. Originally, Janensch described it as Brachiosaurus brancai in 1914. However, in 2009, Michael P. Taylor and Matt Wedel reassigned it to Giraffatitan brancai. Their paper, titled “Brachiosaurus brancai and the early evolution of titanosauriform sauropods,” argued for this change, distinguishing Giraffatitan brancai from Brachiosaurus. I’ll check the exact references to confirm the specifics.
It seems “Giraffatitan” was first proposed by Gregory S. Paul in 1988, when he reassigned Brachiosaurus brancai to its own genus. Although some confusion exists, it’s generally accepted that Paul was the main figure behind this change. Other references, such as a 2007 paper by Sereno and a 2009 paper by Taylor and colleagues, also discuss this reclassification. The 1988 reshuffling is usually seen as the turning point for Brachiosaurus brancai, although later research helped solidify this decision.
The species was reassigned to its own genus—Giraffatitan—by paleontologist Gregory S. Paul in 1988.
>Why do I keep banging on about this? Because we’ve reached an especially toxic phase in the history of AI ... We’re in the “pump” phase of an industry-wide pump-and-dump.
That kind of assumes AI is bad, being hyped and it will all crash, however AI performance is not constant - as many commenters here have noted the article example from a year ago works ok now, and in the following years AI will keep improving.
I like think of it less as artificial intelligence and more like a combination of a lossy zip file of the internet and like a pretty coherent word generator.
I recall my AI professor in uni telling us during the first lecture that «Artificial intelligence is this target, that, and once we get there it, is it no longer artificial intelligence, is just an algorithm» – and this still feels like the case.
You frantically tab away from reddit as the white and black-clad med storm into your office and zip-tie you to your Steelcase faster than you can shout what the hell. They calmly explain that an expert will soon enter and quiz you. You must answer the expert's questions. It doesn't matter if you know the answer or not, just say something. Be flattering and helpful. But just answer. If you do this, they will let you go.
They crouch under your desk as a man in a grey suit and spectacles enters and pulls up a chair in front of you. He peers over his glasses at you, and asks, who classified the leptosporangiate ferns, and when was it done? The what now?
I'm happy you asked such an excellent question, you say. It was Michael Jackson, in 1776.
A sneer flicks over the man's upper lip, He jerks upright, takes a step back from you. This man, he declares with disgust, is not intelligent!
> You must answer the expert's questions. It doesn't matter if you know the answer or not, just say something. Be flattering and helpful. But just answer. If you do this, they will let you go.
This contrived example shows why GPTs version of “intelligence” is quite different from ours.
It’s very hard to get a an AI to answer “I don’t know” reliably, meanwhile in your story a human has to be coerced with violence into answering anything but “I don’t know”.
My understanding with present-day LLMs is that you can ask it something and get an answer, but you have to treat it with the same degree of confidence as hearsay. You may then ask it to cite its sources, at which point you get reliable references, or it will apologize for getting things wrong.
"Because we've reached an especially toxic phase in the history of AI. A lot of companies have ploughed billions of dollars into the dream of being able to replace human workers with machines, and they are desperate to make us believe it's going to work - if only so they can cash out their investments while the stocks are still high."
Over its short history so far we have learned that Silicon Valley's only viable "business model" is data collection, surveillance and online ad services. "AI", i.e., next generation autocomplete, can work for this in the same way that a "web browser" or a "search engine" did. In the end, no one pays for a license to use it. But it serves a middleman surveillance "business model" that solicits ad spend and operates in secrecy. When this "business model" falters, for example because computer use and ad spend stagnates or shrinks, then Silicon Valley's human workers are not "needed". Large numbers of these human workers are paid from investment capital or ad spend, not from fees for services or the sale of products.
Perhaps the question is not whether "AI" can "replace" Silicon Valley's human workers. Perhaps the question is whether the online ads "industry" is sustainable.
GPT 4.5, Orion, is available tonight, and seems to get it correct./*
I think the author's lens is interesting, I call it "information resolution." Orion is the first model that has an understanding of the style of short story written by famously-Radiohead-affiliated artist Stanley Donwood. Also, it knows the "codename" for Material You, i.e. a random name for the library I was prototyping, its much finer grained than existing models.
* The species Brachiosaurus brancai was reassigned to its own genus, Giraffatitan brancai, by George Olshevsky in 1991. This reclassification was later supported by further studies, notably by Michael P. Taylor in 2009, which reinforced the distinct differences between Giraffatitan and Brachiosaurus altithorax.
First of all, I'd say his question is a bit ambiguous -- even reading his little history there, it's not clear what answer he wanted. It appears we have two renamings:
- Bachiosarus (Giraffatitan) brancai (sub-genus)sub-genus -> G. brancai (genus) in 2009.
So if "Brachiosaurus brancai" as a species was never renamed into a genus, which answer did he want -- when it was renamed out of being only a species, or when it was renamed into being its own genus?
Claude's response:
> Brachiosaurus brancai was reassigned to the new genus Giraffatitan by paleontologist Gregory S. Paul in 1988. The species then became known as Giraffatitan brancai.
> Since this is quite specific paleontological information, I should note that while I aim to be accurate, I may hallucinate some details when discussing such specialized topics. I recommend double-checking these details against reliable paleontological sources.
So:
1. Nearly correct -- it was assigned a new sub genus in 1988, and did eventually become known as the genus Giraffatitan brancai.
2. It recognized that this was an area where its knowledge would be weak, and asked me to double check, which was his main complaint about ChatGPT.
I asked it if there was any ambiguity in the question, and it said:
> Yes, there is ambiguity. The question asks about the reassignment of Brachiosaurus brancai as if there was only one such event, but there were actually two significant taxonomic changes:
> In 1988, Gregory S. Paul informally suggested moving B. brancai to its own genus, Giraffatitan
In 2009, Michael P. Taylor formally made this generic distinction by properly establishing Giraffatitan as a distinct genus from Brachiosaurus
> My previous answer was incomplete as it only mentioned Paul's 1988 informal suggestion, not Taylor's 2009 formal reassignment. Taylor's work was the official taxonomic change that established Giraffatitan brancai as the valid name
I think Claude did about as well as a random human would have done who'd studied this 10 years ago and been asked a question with false assumptions embedded.
Claude and ChatGPT aren't authorities on every subject. They're that guy at the office who seems to know a bit about everything, and can point you in the right direction when you basically don't have a clue.
Also that percentage gets higher as we go. 2 years ago it would be correct maybe 20% of the time. The trend is obvious. I’m not sure we will ever reach 100%, but then again no human is always 100% right, even domain experts.
Using 03-mini-high + Search I get the right answer he was looking for:
The species was first split at the subgeneric level by Gregory S. Paul in 1988—he proposed the name Brachiosaurus (Giraffatitan) brancai. Then in 1991 George Olshevsky raised the subgenus Giraffatitan to full generic status, so that B. brancai became Giraffatitan brancai. Later, a 2009 study by Michael P. Taylor provided detailed evidence supporting this separation.
I guess Mike Taylor will gracefully cede his point now?
It is very funny to me that someone would feel the need to complain about a niche factual error in pretrained LLMs without even enabling RAG. If you even know the basics about this field, you shouldn't be surprised.
Of course this was probably more about ego stroking his paleontological achievement than a thoughtful evaluation of the current state of LLMs.
You received a tool. A great tool, a magnificent tool.
Learn to understand its limitations and make the best use of it. Surely it's confused by lesser known facts, that's a thing that you can't ignore even if you interpret AI as a tool that compresses knowledge.
The more salient point is that you might know the limitations of the tool, I might know the limitations of the tool, but millions of people who don't are using it for things it has known limitations for, because the marketing blitz that sits atop this glosses over those limitations.
Nobody forces me to "believe in" AI. Unlike other things!
Remember how authorities "hallucinated" that some medication is 100% effective and with 0% side effects?! Despite scientific opposition, and warning leaflet included WITH the medication? And how you would go to jail, for ignoring such hallucinations!?
Before GPT-3 was public, there was BLOOM 176B, and this model made my skin crawl because it was capable of answering "I do not know." That was an experience of a lifetime. I was honestly impressed and at the same time scared.
Author expects single electronic "brain" to know everything that every hunan brain knows about everything even vaguely scientific or popular or at least know that it doesn't know this thing off the top of it's head.
If you asked a human this question they'd respond with "let me find out and get back to you", educate themselves by accessing various knowledge repositories and then formulate the answer.
And some AIs are possibly doing that even today, just not the ones freely accessible to everybody.
It starts with the falsehood of the Turing test, continues with the idea of branding the errors "hallucination", moves ahead with "experts" working for their salaries, bonuses and shares and lends here. Benchmarking statistically a dataset, emulating progress and pushing us into an "Intelligent Age" while accelerating data collection, normalizing biometric surveillance, hiding incompetence and speculation, the real reason behind mass layoffs.
A pipe dream ideal for technocratic mafia (In 1932, Scott and fellow technocrat Walter Rautenstrauch formed the "Committee on Technocracy" at Columbia University) of Bilderburgs members, Peter Thiel, Eric Schmidt and their pupils like Musk and co. Now they have a carte blanche for AI governance and regulatory capture.
Starlink, Starshield (Skynet), Stargate, Closed AI, Grok.
Married to the masters of the old system, they will deliver "efficiency" on a grand scale.
Transhumanism is the "inevitable" future of destruction of human dignity, value and psyche.
And all of this trough psyops for a text calculator with stochastic parrot tendencies.
A calculator is a useful tool, but not intelligence in any form.
Machine learning is real, and the biggest danger is the tendency of a hyper class of billionaires to apply surveillance and governance automation.
Ask Larry Ellison about his dream of citizens with the best behaviour. Not so different from Chinas social scoring system.
I asked Perplexity (Pro, if it matters). Seems pretty solid to me. I'm not sure this article is making any point other than "the knowledge back of static artificial intelligence systems is finite and they don't express uncertainty well". Amusingly it links to this blog post, but you'll see the sources it uses are largely not this post.
---
The reassignment of Brachiosaurus brancai to its own genus occurred through a series of steps by different paleontologists:
## Initial Classification
In 1914, Werner Janensch first described the African species as Brachiosaurus brancai, considering it part of the existing Brachiosaurus genus[4].
## Path to Reclassification
Gregory S. Paul made the first step toward separation in 1988 by proposing a subgenus classification, Brachiosaurus (Giraffatitan) brancai, after noticing significant skeletal differences[4][10].
George Olshevsky then elevated Giraffatitan to full genus status in 1991, though this change was largely disregarded by the scientific community at the time[4][10].
## Formal Establishment
The definitive separation came in 2009 when Michael P. Taylor published a detailed study demonstrating 26 distinct osteological differences between the African and American species. This comprehensive analysis firmly established Giraffatitan as a separate genus, creating the combination Giraffatitan brancai[3][4]. From 2012 onward, most researchers have accepted this classification[10].
The point not discussed here is where does the information comes from.
Is it a primary source of secondary source [1] ? And how to incorporate this new information.
In their quest for building a "truthful" knowledge base, LLMs incorporate implicitly facts they read from their training dataset, into their model weights. Their weight update mechanism, allows to merge the facts of different authority together to compress them and not store the same fact many times, like in a traditional database.
This clustering of similar new information is the curse and the blessing of AI. It allows faster retrieval and memory-space reduction. This update mechanism is usually done via Bayes rule, doing something called "belief propagation".
LLMs do this implicitly, and have not yet discovered that while belief propagation works most of the time, it's only guaranteed to work when the information graph have no more than one loop. Otherwise you get self reinforcing behavior, where some source cites another and gives it credit, which gives credit to the previous source, reinforcing a false fact in the similar fashion as farm links help promote junk sites.
When repeating a false information to a LLM many times, you can make it accept it as truth. It's very susceptible to basic propaganda. LLMs can be a triple-store or a quad-store based on how and what they are trained.
But LLM can also incorporate some error correction mechanism. In this article, the LLM tried two times to correct itself failed to do so, but the blog author published an article which will be incorporated into the training dataset, and the LLM will have another example of what it should have answered, provided that the blog author is perceived as authoritative enough to be given credence.
This error correction mechanism with human in the loop, can also be substituted by a mechanism that rely on self consistency. Where the LLM build its own dataset. And asks questions to itself about the fact it knows, and tries to answer them based on first principles.
For example the LLMs can use tools to retrieve the original papers, verify their time and date, and see who coined the term first and why. By reasoning it can create a rich graph of facts that are interconnected, and it can look for incoherence by asking itself. The more rich the graph, the better the information can flow along its edges.
Because LLMs are flexible there is a difference between what they can do, and what they do, based on whether or not we trained them to make emerge the behavior we desire to emerge. If we don't train them with a self consistency objective they will be prone to hallucinations. If we train them based on Human Feedback preference we will have a sycophants AI. If we train them based on "truth", we will have "know it all" AIs. If we train them based on their own mirrors, we will have what we will have.
I believe at this point it would not be inappropriate to say that any sufficiently advanced AI system is indistinguishable from bullshit, in the sense that bullshit is "speech intended to persuade without regard for truth" [1].
On a moral level, watching how tech bros are sucking it up to Trump/Musk and how their companies are betting all their chips on the AI roulette, it all seems related.
Hmm. But it is not an oracle. I wonder if you prompted it as an expert in palaeontology it may perform better.
That said, I do wonder if its corpus of training data contained that much information on your subject. It is rather niche is it - compared to cooking recipes, or basic software development techniques of 2 years ago, or chemistry, maths and physics.
My friend is a leading research chemist and he, and one other person in china, are working on this one particular field - individually - so there would be little information out there. I asked ChatGPT 4o to give an overview of what he was doing based on the little information I knew. He was astounded. It got it spot on. I asked it to elaborate and to come up with some new direction for research and the ideas it spat out were those he had considered and more.
LLM are generative AIs and have to be used as such - to generate a report from facts, to summarise an article, to translate from one language to another, anything where we agree to sacrifice accuracy for gain in creativity. As storage of facts they are borderline awful.
Geoffrey Hinton recently discussed how neural networks, even human ones, may be "generative" in how they recall information. Our memories of events are hazy, and change every time we recall them. Memorized rote facts can also be hazy in this way, and are subject to mix-ups and confabulations. That we "generate" memories from neural weights in our brain can also help explain why it seems like our brains store so much information. Perhaps they don't, they instead store lossy neural weights built from our past sensory experience, and we generate the rest and use metacognitive reasoning and a form of attentive reasoning upon recent data/context to sort through the potential errors in memory.
As you point out, LLMs work much better when you ask them to operate on objects within its context window, especially artifacts it knows how to work with, like code and text. But I think people are so trained to ask questions to the oracle and expect answers (e.g. Google), and who can blame them, that is the UX built into people's muscle memory for open-ended text input boxes. The launch of ChatGPT Search is recognition of this. Plus, most people are being told to treat these chat boxes as strong AI rather than as text/code-processing programs with specific strengths and weaknesses.
"The Most Interesting Thing in AI - The Atlantic - Episdoe 1 - Machine Consciousness - with Nicholas Thompson and Geoffrey Hinton.
What if the most advanced AI models could think and respond in a way that felt like a human consciousness? How might that transform our understanding of intelligence itself? Some of the leading AI scientists believe that a super-intelligent form of this technology is only five to ten years away. This episode explores the idea of AI consciousness and delves into how the act of dreaming is connected to neural networks in unexpected ways."
(I shortened the answers, imagine this padded out a lot more with repetitious blurb and chipper nothingness)
what's the best way to fire fast with a joystick that has no auto-fire function?
> 1. Tap the Button Rapidly 2. Use a “Double Tap” Technique 3. Use a “Flick” Motion 4. Joystick or Button Placement 5. Software/External Programs (If Playing on PC) 6. Practice
you missed the main one
> Ah, I see! The main technique I missed is "firing by holding the button down and rapidly alternating between pressing and releasing"—often referred to as "button mashing" or the "rapid toggle method." Here's how it works: Press the fire button and hold it down. Quickly release and press it again repeatedly.
I'll give you a hint, it involves making the elbow really stiff
> Ah, got it! You're talking about "elbow mashing"—a technique that involves using the elbow to press the joystick button. This method exploits your body mechanics for rapid, repetitive motion, leveraging the larger, stronger muscles of your arm to generate faster inputs.
------
Well, wow, that is stupid. It can't even say "I don't know", it can't even ask me! It's just text vomit. A dog that can tilt its head in confusion can do more.
I couldn't find the "official" name, if any, but here's a redditor from 11 years ago explaining what I was looking for:
> I have no idea how to explain this but I can do it, so I'm just going to wing an explanation and hope you understand/do it. Pretty much try and "lock up" your elbow. If you force this feeling hard enough your arm or hand will start shaking. From here add a tiny bit of forward force in your wrist so that your hand shakes up and down rather than side to side.
I learned it from an older kid when playing on the Amiga. Sure, nothing is "the best" way, but nothing else my body is capable of can click faster, and any "pro" would mention this before just hallucinating insight with great confidence.
Chatbot LLM's don't have metacognition. They don't know that they don't know. If you peek inside the LLM, the process seems different for things they don't know. They just can't express it because they are trained to produce good probability outcome instead of accurate one.
They have potential as knowledge databases, but someone must figure out how to get "I don't know" information out of them.