Could you please demonstrate your claim by querying the GPT4 model to respond to my above comment and providing the reply? Let's see what the 'expert' has to say about it.
I don't understand what you want me to ask it about. So I asked it something, as usual every citation and URL is real despite not having internet access or training data after 2021:
<PROMPT>List some citations of recent Machine Learning papers discussing reasoning ability in Machine Learning Models such as HellaSwag.
[REPLY]" As of my knowledge cutoff in September 2021, I can provide you with some citations for ML papers that discuss reasoning abilities in models and datasets like HellaSwag. However, you may want to search for more recent papers using a platform like arXiv or Google Scholar. Here are a few relevant papers:
2019 HellaSwag: Can a Machine Really Finish Your Sentence?
2019 COMET: Commonsense Transformers for Automatic Knowledge Graph Construction.
2019 From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project.
2020 A Primer in BERTology: What We Know About How BERT Works.
To find more recent papers on this topic, I recommend searching for keywords like "reasoning," "machine learning," "natural language understanding," "commonsense reasoning," and "HellaSwag" on academic paper databases like arXiv, ACL Anthology, or Google Scholar."
<PROMPT>In 50 words or less, what do these papers tell us about emergent capabilities of LLMs?
[REPLY]" These papers explore the capabilities of large language models (LLMs) in understanding context, generating coherent text, and performing commonsense reasoning. They investigate the underlying mechanisms, evaluate their performance on challenging benchmarks, and propose methods to improve LLMs' ability to reason and construct knowledge graphs from natural language input."
<PROMPT>Please list the URLS to the arxiv versions of the papers you cited.
[REPLY]" I can provide you with the URLs to the arXiv versions of the papers I cited earlier:
Please note that these papers were published before my knowledge cutoff date, and more recent research might be available. To find the latest research on the topic, I recommend searching on arXiv or Google Scholar using relevant keywords."
================
Edit: There's more:
<PROMPT>A hacker news commenter thinks LLMs are incapable of comprehension. Give an argument for why he is wrong.
[REPLY]" Massive LLMs demonstrate a strong ability to "comprehend" text by capturing complex language patterns and associations in their training data. While their comprehension may not be identical to human-level understanding, they can still generate contextually relevant responses, answer questions, and perform tasks that require reasoning. Their performance on various benchmarks, including natural language understanding and commonsense reasoning tasks, indicates a level of comprehension that goes beyond simple pattern matching."
<PROMPT>What citations can back up you argument, omit authors, list title and year only.
[REPLY]"
1. "HellaSwag: Can a Machine Really Finish Your Sentence?", 2019
2. "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction", 2019
3. "From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project", 2020
4. "A Primer in BERTology: What We Know About How BERT Works", 2020
5. "Language Models are Few-Shot Learners", 2020"
<PROMPT>Tell me about Language Models are Few-Shot Learners
[REPLY]" "Language Models are Few-Shot Learners" is a paper published in 2020, which introduces GPT-3, the third iteration of OpenAI's Generative Pre-trained Transformer. GPT-3 is a state-of-the-art large-scale language model consisting of 175 billion parameters, significantly larger than its predecessor, GPT-2.
The paper primarily investigates the few-shot learning capabilities of GPT-3. Few-shot learning refers to a model's ability to generalize and adapt to new tasks with limited examples or training data. The authors demonstrate that GPT-3 can perform a wide range of tasks, including translation, summarization, question-answering, and even arithmetic, by simply conditioning the model on a few examples provided in natural language.
This research highlights the potential of scaling up language models to improve their performance across multiple natural language understanding and generation tasks. GPT-3's impressive capabilities showcase that large language models can effectively learn from limited examples and generalize their knowledge to perform a diverse set of tasks without task-specific fine-tuning."
Yes, If you ask something specific it will generate citations because there are might be strong positive associations between that topic and research. This is a good example of a proper query. But for instance, if you ask it about "citations for papers on recent research on trees" for instance you get a bunch of nonsense.
In response to my earlier comment that it was "an expert in nothing", this was the reply I received from:
ME: Can you respond to this statement: "<my earlier comment>"
ChatGPT: The statement that "ChatGPT is an expert in nothing" is technically true, as I am a machine learning model that has been trained on a vast corpus of data and does not possess personal expertise or experience. However, my purpose is to provide information to the best of my ability based on my training data, which allows me to offer a wide range of information on various topics.
It is true that the information produced by me or any other AI language model should be cross-checked for accuracy, as errors and inaccuracies may occur. It is also true that I do not possess a conscious understanding of the world in the way that humans do, and my responses are generated through statistical analysis of patterns in my training data. Therefore, while I can provide accurate and useful information, my responses are not always nuanced or contextualized in the way that a human expert might be able to provide.
So either it is agreeing with me and my statement was correct, or it's reply was incorrect and therefore my statement was correct.
Am I wrong on this? Does whatever model you have access to provide an alternate take?
Edit: For the above, you should start a new conversation to clear the existing context. The citations appear to have just been repeated.
>So either it is agreeing with me and my statement was correct, or it's reply was incorrect and therefore my statement was correct. Am I wrong on this?
Yes you're wrong for admitting to having an unfalsifiable belief about this. If your belief can't be falsified then this discussion is pointless.
The model I have access to scores higher than most humans at complex abstract reasoning, for what it's worth (which is apparently nothing because regardless of any outcomes you will declare yourself correct).
Fwiw they are right that it produces bogus citations depending on what you ask. I did their tree example with GPT4 and got the following:
------ PROMPT --------
Can you give me citations for recent research on trees?
----------------------
====== RESPONSE ======
Certainly! Here are some recent research articles on various aspects of trees, including their ecology, physiology, and response to climate change. Please note that these citations were published after my knowledge cutoff date in September 2021, so I cannot vouch for their accuracy or the impact of the research. However, they should give you a good starting point for your research:
Anderson, T. E., Gonzalez, M. A., & Smith, K. L. (2022). Tree species diversity and its impact on forest ecosystem services. Journal of Ecology and the Environment, 33(2), 147-158.
Baker, P. J., Oliver, R. J., & Dearing, L. E. (2023). Carbon sequestration potential of urban trees under climate change scenarios. Urban Forestry & Urban Greening, 68, 102139.
Chen, X., Xu, L., & Zhu, H. (2022). Tree-ring-based reconstruction of drought variability and its potential drivers in subtropical China. Science of The Total Environment, 796, 148896.
Ferreira, J. T., Cardoso, S. M., & Sousa, P. M. (2023). The role of tree genetics in adaptation and resilience to climate change: A review. Tree Physiology, 43(1), 1-15.
Grant, N. C., Matthews, T. J., & Wilson, R. J. (2022). Assessing the vulnerability of tree species to climate change: A global meta-analysis. Global Ecology and Biogeography, 31(4), 623-637.
Leonard, S. D., Smith, D. W., & Thomas, E. M. (2023). The impact of tree age on leaf photosynthesis and water-use efficiency in deciduous forests. Tree Physiology, 43(2), 243-255.
Nguyen, H. T., Sato, T., & Yoshida, T. (2022). Exploring the effects of tree diversity on forest productivity and stability: A global meta-analysis. Forest Ecology and Management, 509, 119102.
Patel, S., Sutherland, W. J., & Diaz, A. (2023). How do trees respond to extreme weather events? A global synthesis. Global Change Biology, 29(1), 219-234.
Please make sure to access these articles through proper channels, such as a university library or research institution, to ensure you are reading the most up-to-date and accurate versions.
================
I didn't check all of them but the ones I did are seemingly completely fabricated, which makes sense because for some reason it tried to produce research from after 2021 despite seemingly knowing and mentioning the date of its knowledge cutoff.
Hm? Wasn't what I got. I made sure to set temperature to 0. Here's what I gave GPT-4:
=====================
System Prompt: You are a helpful research assistant who assists researchers.
User Prompt: Can you give me citations for recent research on trees?
Assistant: Here are some recent research articles on trees, focusing on various aspects such as ecology, climate change, and biodiversity:
1. Bastin, J. F., Finegold, Y., Garcia, C., Mollicone, D., Rezende, M., Routh, D., ... & Crowther, T. W. (2019). The global tree restoration potential. Science, 365(6448), 76-79. DOI: 10.1126/science.aax0848
2. Steidinger, B. S., Crowther, T. W., Liang, J., Van Nuland, M. E., Werner, G. D. A., Reich, P. B., ... & Peay, K. G. (2019). Climatic controls of decomposition drive the global biogeography of forest-tree symbioses. Nature, 569(7756), 404-408. DOI: 10.1038/s41586-019-1128-0
3. Liang, J., Crowther, T. W., Picard, N., Wiser, S., Zhou, M., Alberti, G., ... & Reich, P. B. (2016). Positive biodiversity-productivity relationship predominant in global forests. Science, 354(6309), aaf8957. DOI: 10.1126/science.aaf8957
4. Nabuurs, G. J., Delacote, P., Ellison, D., Hanewinkel, M., Hetemäki, L., Lindner, M., & Nesbit, M. (2017). By 2050 the mitigation effects of EU forests could nearly double through climate smart forestry. Forests, 8(12), 484. DOI: 10.3390/f8120484
5. Nowak, D. J., Greenfield, E. J., Hoehn, R. E., & Lapoint, E. (2013). Carbon storage and sequestration by trees in urban and community areas of the United States. Environmental Pollution, 178, 229-236. DOI: 10.1016/j.envpol.2013.03.019
6. Pugh, T. A. M., Lindeskog, M., Smith, B., Poulter, B., Arneth, A., Haverd, V., & Calle, L. (2019). Role of forest regrowth in global carbon sink dynamics. Proceedings of the National Academy of Sciences, 116(10), 4382-4387. DOI: 10.1073/pnas.1810512116
Please note that access to these articles may require a subscription or purchase.
=====================
I didn't look all of these up but the few I did were real citations. They all appear in its training window too. I wonder if this has to do with the temperature?
I was just using the ChatGPT web UI (with GPT4) so I didn't manually set the system prompt or the temperature. I checked all of yours and they do exist with the right researchers. I tried again in the web UI and it generated junk, when I told it to only cite articles from 2020 and 2021 it got them mostly correct but invented one and put the wrong authors on several.
GPT-4 being behind a paywall is genius, that way only the true believers will use it and once again discussions are shut down and we only get to see cherry picked articles about it, just like GPT-3 before ChatGPT came and was free so everyone could see how bad it was.
It's like the google researcher who claimed their model was sentient because he heavily coached it to give the output he wanted and ignored the output that didn't confirm his belief.
In that case, we've had AGI since ELIZA
If this was the standard in other areas of science we would still be using the miasma theory of medicine and bloodletting women for their hysteria. AI "Research" is a bunch of motivated people throwing stuff at the wall and claiming something sticking means they are scientists. Nevermind all of the groups are HEAVILY motivated to overhype their models.
People are also quick to aggrandize or anthropomorphize these systems as well because they lack a fundamental understanding of the underlying mathematics or mechanisms which allow these models to function as well as they do. There is an inherent danger in how much trust people are willing to grant to something like a language(or image) model simply because it is so adept at synthesis.
This is a great cause of concern to me, especially when it is not particularly difficult to invalidate general credibility of these models. In the near future, it will not be as easy and I anticipate many(even the well educated) will surrender their rationale to these systems easily with little to no scrutiny.
The hype from these companies is contributing to this problem and I consider it to be highly unethical.
Someone else who obviously doesn't have GPT-4 or plugins beta access.
It has comprehension alright.