Hacker News new | past | comments | ask | show | jobs | submit login

These comments are filled with misunderstandings of the result. There were three groups of kids:

1. Control, with no LLM assistance at any time.

2. "GPT Base", raw ChatGPT as provided by OpenAI.

3. "GPT Tutor", improved by the researchers to provide hints rather than complete answers and to make fewer mistakes on their specific problems.

On study problem sets ("as a study assistant"), kids with access to either GPT did better than control.

When GPT access was subsequently removed from all participants ("on tests"), the kids who studied with "GPT Base" did worse than control. The kids with "GPT Tutor" were statistically indistinguishable from control.




Changing things almost always improves results, that is the first rule you need to remember during education testing. Most of the improvements disappear when you make it standard.

This effect likely comes from novelty being more interesting so kids gets more alert, but when they are used to it then it is the same old boring thing and education results go back to normal. Of course things can improve or get worse, but in general it is really hard to say, you need to have a massive advantage over the standard during testing to actually get any real improvements, most of the time you just make things worse.


Reading comprehension is really awful nowadays or people tend to just comprehend what confirms their prior beliefs. The sad part is that none of those people will ever realize the errors in their comprehension of the article. That's exactly one of the mechanisms how people form wrong opinions, it's compounding and almost impossible to change.

The takeaway should be (besides the research still needing reproduction) to encourage the control of the type of AI agent that is given to students, ones that don't just give answers to copy but provide tutoring. OpenAI should be forced to develop such a "student mode" immediately and parents and educators need to be made aware of it to make sure students are using it, otherwise students are going to get much worse in tests, as they just ask it for answers to copy in assignments.


Assuming that the kids with "Human Tutor" were statistically better than control (they were not in the study so we will not know) - this is a very poor showing for ChatGPT.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: