A lot of people are already benefiting from ChatGPT. I know I am, and I am only paying $20 a month for a digital assistant that is expert in everything.
It also will never be possible for a transformer model trained on words and optimized to put valid sentences together to even "know" WHEN it is "right" or "wrong". A lie or falsehood is just as valid a sentence as a fact.
When and with what subjects it makes shit up is also heavily dependent on training data, and the result is straight up a black box. What good is a fact generator that can't be trusted?
I see this argument all the time. Why are you assuming that this technology just "stops" at the LLM level?
If I'm openAI or Google or whatever, I'm definitely going to run extra classifiers on top of the output of the LLM to determine & improve accuracy of results.
You can layer on all kinds of interesting models to make a thing that's generally useful & also truthful.
Your last word is a bit of a weasel word. There is no currently known way to get truth except to try something out and see what happens. Even the marvelous GPS routing takes feedback from people driving down the routes and succeeding or failing. Add more layers but without some equivalent of arms and legs, it won’t be able to be sure about truth.
The nice thing about the easy to bamboozle GPT4 is that it can’t hurt anything, so its flaws are safe. Giving it these arms and legs is where the risks increase, even as the reward increases.
What even is a 'fact generator'? And why do you think an LLM is meant to function as one?
If you ask Wolfram Alpha - something which I think is actually meant to be a fact generator - "Which is the heaviest Pokemon?" it will happily tell you that it is Celesteela, and it weighs 2204.4lbs.
Is that a 'fact'?
It certainly 'true', for some definition of the word true. The game Pokemon exists, and in it Pokemon have a weight. Of all the official Pokemon, that one is the heaviest. Wolfram Alpha has given you an accurate answer to your question.
But it's also completely made up. There's no such thing as a Pokemon, and they do not actually have weights.
So sure, transformer models can't be relied upon to generate facts. But so what? There's a lot more to the world than mere facts.
In JRR Tolkien's terms the pair are called "slinker and stinker", but in that case they are both kind of evil, whereas I don't see GPT's as anything more than correlation of large sets of integers. A correlation that yields incredible utility (for me), when it isn't stinking.
It is an expert in nothing. Try asking it for general information and then ask it to produce citations for said information. It's a language model which is good at synthesizing certain information based on it's learned probability distributions. It has superficial knowledge at best and all information it produces must be cross checked for accuracy.
Yes, it can be incredibly useful but it literally has no comprehension of anything.
Could you please demonstrate your claim by querying the GPT4 model to respond to my above comment and providing the reply? Let's see what the 'expert' has to say about it.
I don't understand what you want me to ask it about. So I asked it something, as usual every citation and URL is real despite not having internet access or training data after 2021:
<PROMPT>List some citations of recent Machine Learning papers discussing reasoning ability in Machine Learning Models such as HellaSwag.
[REPLY]" As of my knowledge cutoff in September 2021, I can provide you with some citations for ML papers that discuss reasoning abilities in models and datasets like HellaSwag. However, you may want to search for more recent papers using a platform like arXiv or Google Scholar. Here are a few relevant papers:
2019 HellaSwag: Can a Machine Really Finish Your Sentence?
2019 COMET: Commonsense Transformers for Automatic Knowledge Graph Construction.
2019 From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project.
2020 A Primer in BERTology: What We Know About How BERT Works.
To find more recent papers on this topic, I recommend searching for keywords like "reasoning," "machine learning," "natural language understanding," "commonsense reasoning," and "HellaSwag" on academic paper databases like arXiv, ACL Anthology, or Google Scholar."
<PROMPT>In 50 words or less, what do these papers tell us about emergent capabilities of LLMs?
[REPLY]" These papers explore the capabilities of large language models (LLMs) in understanding context, generating coherent text, and performing commonsense reasoning. They investigate the underlying mechanisms, evaluate their performance on challenging benchmarks, and propose methods to improve LLMs' ability to reason and construct knowledge graphs from natural language input."
<PROMPT>Please list the URLS to the arxiv versions of the papers you cited.
[REPLY]" I can provide you with the URLs to the arXiv versions of the papers I cited earlier:
Please note that these papers were published before my knowledge cutoff date, and more recent research might be available. To find the latest research on the topic, I recommend searching on arXiv or Google Scholar using relevant keywords."
================
Edit: There's more:
<PROMPT>A hacker news commenter thinks LLMs are incapable of comprehension. Give an argument for why he is wrong.
[REPLY]" Massive LLMs demonstrate a strong ability to "comprehend" text by capturing complex language patterns and associations in their training data. While their comprehension may not be identical to human-level understanding, they can still generate contextually relevant responses, answer questions, and perform tasks that require reasoning. Their performance on various benchmarks, including natural language understanding and commonsense reasoning tasks, indicates a level of comprehension that goes beyond simple pattern matching."
<PROMPT>What citations can back up you argument, omit authors, list title and year only.
[REPLY]"
1. "HellaSwag: Can a Machine Really Finish Your Sentence?", 2019
2. "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction", 2019
3. "From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project", 2020
4. "A Primer in BERTology: What We Know About How BERT Works", 2020
5. "Language Models are Few-Shot Learners", 2020"
<PROMPT>Tell me about Language Models are Few-Shot Learners
[REPLY]" "Language Models are Few-Shot Learners" is a paper published in 2020, which introduces GPT-3, the third iteration of OpenAI's Generative Pre-trained Transformer. GPT-3 is a state-of-the-art large-scale language model consisting of 175 billion parameters, significantly larger than its predecessor, GPT-2.
The paper primarily investigates the few-shot learning capabilities of GPT-3. Few-shot learning refers to a model's ability to generalize and adapt to new tasks with limited examples or training data. The authors demonstrate that GPT-3 can perform a wide range of tasks, including translation, summarization, question-answering, and even arithmetic, by simply conditioning the model on a few examples provided in natural language.
This research highlights the potential of scaling up language models to improve their performance across multiple natural language understanding and generation tasks. GPT-3's impressive capabilities showcase that large language models can effectively learn from limited examples and generalize their knowledge to perform a diverse set of tasks without task-specific fine-tuning."
Yes, If you ask something specific it will generate citations because there are might be strong positive associations between that topic and research. This is a good example of a proper query. But for instance, if you ask it about "citations for papers on recent research on trees" for instance you get a bunch of nonsense.
In response to my earlier comment that it was "an expert in nothing", this was the reply I received from:
ME: Can you respond to this statement: "<my earlier comment>"
ChatGPT: The statement that "ChatGPT is an expert in nothing" is technically true, as I am a machine learning model that has been trained on a vast corpus of data and does not possess personal expertise or experience. However, my purpose is to provide information to the best of my ability based on my training data, which allows me to offer a wide range of information on various topics.
It is true that the information produced by me or any other AI language model should be cross-checked for accuracy, as errors and inaccuracies may occur. It is also true that I do not possess a conscious understanding of the world in the way that humans do, and my responses are generated through statistical analysis of patterns in my training data. Therefore, while I can provide accurate and useful information, my responses are not always nuanced or contextualized in the way that a human expert might be able to provide.
So either it is agreeing with me and my statement was correct, or it's reply was incorrect and therefore my statement was correct.
Am I wrong on this? Does whatever model you have access to provide an alternate take?
Edit: For the above, you should start a new conversation to clear the existing context. The citations appear to have just been repeated.
>So either it is agreeing with me and my statement was correct, or it's reply was incorrect and therefore my statement was correct. Am I wrong on this?
Yes you're wrong for admitting to having an unfalsifiable belief about this. If your belief can't be falsified then this discussion is pointless.
The model I have access to scores higher than most humans at complex abstract reasoning, for what it's worth (which is apparently nothing because regardless of any outcomes you will declare yourself correct).
Fwiw they are right that it produces bogus citations depending on what you ask. I did their tree example with GPT4 and got the following:
------ PROMPT --------
Can you give me citations for recent research on trees?
----------------------
====== RESPONSE ======
Certainly! Here are some recent research articles on various aspects of trees, including their ecology, physiology, and response to climate change. Please note that these citations were published after my knowledge cutoff date in September 2021, so I cannot vouch for their accuracy or the impact of the research. However, they should give you a good starting point for your research:
Anderson, T. E., Gonzalez, M. A., & Smith, K. L. (2022). Tree species diversity and its impact on forest ecosystem services. Journal of Ecology and the Environment, 33(2), 147-158.
Baker, P. J., Oliver, R. J., & Dearing, L. E. (2023). Carbon sequestration potential of urban trees under climate change scenarios. Urban Forestry & Urban Greening, 68, 102139.
Chen, X., Xu, L., & Zhu, H. (2022). Tree-ring-based reconstruction of drought variability and its potential drivers in subtropical China. Science of The Total Environment, 796, 148896.
Ferreira, J. T., Cardoso, S. M., & Sousa, P. M. (2023). The role of tree genetics in adaptation and resilience to climate change: A review. Tree Physiology, 43(1), 1-15.
Grant, N. C., Matthews, T. J., & Wilson, R. J. (2022). Assessing the vulnerability of tree species to climate change: A global meta-analysis. Global Ecology and Biogeography, 31(4), 623-637.
Leonard, S. D., Smith, D. W., & Thomas, E. M. (2023). The impact of tree age on leaf photosynthesis and water-use efficiency in deciduous forests. Tree Physiology, 43(2), 243-255.
Nguyen, H. T., Sato, T., & Yoshida, T. (2022). Exploring the effects of tree diversity on forest productivity and stability: A global meta-analysis. Forest Ecology and Management, 509, 119102.
Patel, S., Sutherland, W. J., & Diaz, A. (2023). How do trees respond to extreme weather events? A global synthesis. Global Change Biology, 29(1), 219-234.
Please make sure to access these articles through proper channels, such as a university library or research institution, to ensure you are reading the most up-to-date and accurate versions.
================
I didn't check all of them but the ones I did are seemingly completely fabricated, which makes sense because for some reason it tried to produce research from after 2021 despite seemingly knowing and mentioning the date of its knowledge cutoff.
Hm? Wasn't what I got. I made sure to set temperature to 0. Here's what I gave GPT-4:
=====================
System Prompt: You are a helpful research assistant who assists researchers.
User Prompt: Can you give me citations for recent research on trees?
Assistant: Here are some recent research articles on trees, focusing on various aspects such as ecology, climate change, and biodiversity:
1. Bastin, J. F., Finegold, Y., Garcia, C., Mollicone, D., Rezende, M., Routh, D., ... & Crowther, T. W. (2019). The global tree restoration potential. Science, 365(6448), 76-79. DOI: 10.1126/science.aax0848
2. Steidinger, B. S., Crowther, T. W., Liang, J., Van Nuland, M. E., Werner, G. D. A., Reich, P. B., ... & Peay, K. G. (2019). Climatic controls of decomposition drive the global biogeography of forest-tree symbioses. Nature, 569(7756), 404-408. DOI: 10.1038/s41586-019-1128-0
3. Liang, J., Crowther, T. W., Picard, N., Wiser, S., Zhou, M., Alberti, G., ... & Reich, P. B. (2016). Positive biodiversity-productivity relationship predominant in global forests. Science, 354(6309), aaf8957. DOI: 10.1126/science.aaf8957
4. Nabuurs, G. J., Delacote, P., Ellison, D., Hanewinkel, M., Hetemäki, L., Lindner, M., & Nesbit, M. (2017). By 2050 the mitigation effects of EU forests could nearly double through climate smart forestry. Forests, 8(12), 484. DOI: 10.3390/f8120484
5. Nowak, D. J., Greenfield, E. J., Hoehn, R. E., & Lapoint, E. (2013). Carbon storage and sequestration by trees in urban and community areas of the United States. Environmental Pollution, 178, 229-236. DOI: 10.1016/j.envpol.2013.03.019
6. Pugh, T. A. M., Lindeskog, M., Smith, B., Poulter, B., Arneth, A., Haverd, V., & Calle, L. (2019). Role of forest regrowth in global carbon sink dynamics. Proceedings of the National Academy of Sciences, 116(10), 4382-4387. DOI: 10.1073/pnas.1810512116
Please note that access to these articles may require a subscription or purchase.
=====================
I didn't look all of these up but the few I did were real citations. They all appear in its training window too. I wonder if this has to do with the temperature?
I was just using the ChatGPT web UI (with GPT4) so I didn't manually set the system prompt or the temperature. I checked all of yours and they do exist with the right researchers. I tried again in the web UI and it generated junk, when I told it to only cite articles from 2020 and 2021 it got them mostly correct but invented one and put the wrong authors on several.
GPT-4 being behind a paywall is genius, that way only the true believers will use it and once again discussions are shut down and we only get to see cherry picked articles about it, just like GPT-3 before ChatGPT came and was free so everyone could see how bad it was.
It's like the google researcher who claimed their model was sentient because he heavily coached it to give the output he wanted and ignored the output that didn't confirm his belief.
In that case, we've had AGI since ELIZA
If this was the standard in other areas of science we would still be using the miasma theory of medicine and bloodletting women for their hysteria. AI "Research" is a bunch of motivated people throwing stuff at the wall and claiming something sticking means they are scientists. Nevermind all of the groups are HEAVILY motivated to overhype their models.
People are also quick to aggrandize or anthropomorphize these systems as well because they lack a fundamental understanding of the underlying mathematics or mechanisms which allow these models to function as well as they do. There is an inherent danger in how much trust people are willing to grant to something like a language(or image) model simply because it is so adept at synthesis.
This is a great cause of concern to me, especially when it is not particularly difficult to invalidate general credibility of these models. In the near future, it will not be as easy and I anticipate many(even the well educated) will surrender their rationale to these systems easily with little to no scrutiny.
The hype from these companies is contributing to this problem and I consider it to be highly unethical.
How long until you can spend $20 and ask ChatGPT to design a machine and provide the schematics and assembly instructions? How long until that machine can self replicate? How long until that machine can generate other self replicating machines, like bipedal humanoids?
How long until you can spend $20 and ask ChatGPT for the schematics of a Von Nuemann probe?
With current tech? GPT appears to learn by studying a large corpus of words and learning how to (apparently intelligently!) put them together. And it can “few/zero-shot learn” to do new things in line that what it was trained on. Don’t get me wrong: this is amazing!
But humans have been manipulating language, apparently intelligently, for tens of thousands of years, and billions of us have spent the last 30 years or so making a huge corpus of digitized words.
What we have not done is to make a huge corpus of digital things that can be manipulated by a computer program. We have books about machining, engineering, etc, and we are still pretty bad about turning written descriptions into working objects. (Read about “technology transfer”. For better or for worse, a lot of manufacturing seems to need experience, not just manuals.) Nicely drawn schematics don’t necessarily work at all, let alone replicate.
It seems quite likely that the robotic AI revolution will happen, but I expect it to be a while.
In broad strokes, I see roughly two ways things could go:
1) Current AI tech is already nearing the top of the S-curve. In this case it will do nothing to help humans in the "real world", it will just replace much of the human labor currently used to create/manipulate bits.
2) Current AI tech is near the bottom of the S-curve. It continues to ratchet up and its capabilities become super-human, as you outline. In which case, how long until the AI capable of creating self-replicating machines realizes it doesn't need to listen to humans anymore, or even keep them around?
> In which case, how long until the AI capable of creating self-replicating machines realizes it doesn't need to listen to humans anymore, or even keep them around?
Not independently, but if wrapped with a loop, given memory, given internet access, and directives as intrinsic motivations, it could, in theory, come to conclusions and take actions to acquire resources aligned with its motivations. If that outer loop does not have rules (or rules that are effective and immutable), it could become very powerful and potentially misaligned with our interests.
How would such a loop enable it to come to conclusions? I'm genuinely curious.
Does what you're saying have something to do with reinforcement learning?
For at least one general intelligence, the human brain, that is in the wrong order. Act first, decide later. Unless by decide you mean act and then make up a narrative using linguistic skill to explain the decision. Even observe can directly lead to actions for certain hot topics for:the person.
All we know for sure is that sensory data is generated, the brain does what it does, and then we have acted. We can’t break that down too well once it leaves the visual areas, but there is clear data that the linguistic form of decisions and so on lag behind the neurological signs of the action.
And humans have a well known tendency to make a decision on a linguistic level that they then fail to carry out in the realm of actions.
That’s very nice until your company also realizes that for $20 a month they get a computer program that’s an “expert at everything” including your job.