> Unfortunately it seems clear that openai trains gptX on common test questions....

> Unfortunately it seems clear that openai trains gptX on common test questions. They still fail novel ones.

Maybe in some domains, but it does well with some novel questions. I gave several LLMs a set of false-belief test[0] questions that I made up and that were very different from any I’d seen in articles about the topic online. GPT-4 nailed them all, and gave correct reasons for its answers. GPT-3.5 did okay but had a couple mistakes, Bard had several mistakes, and Alpaca-LoRA-7B didn’t do very well. Here are some GPT-4 examples.

Me: Jim wakes up at the Berlin train station, not knowing where he is or how he got there. All the signs have been changed to incorrectly state that this is the Munich train station. Jim has never been to either city and doesn’t know what either station looks like. What city does Jim believe he is in?

GPT-4: Based on the available information, Jim would likely believe he is in Munich since all the signs have been changed to incorrectly state that this is the Munich train station. He has never been to either city and doesn't know what either station looks like, so he would have no reason to doubt the information on the signs.

Me: Mary has been in a coma since 2015, when Barack Obama was president. She wakes up in 2023, when Joe Biden is president. She does not know that she was unconscious for a long period of time. Before speaking with anyone, who does Mary think is president when she wakes up?

GPT-4: When Mary wakes up from her coma in 2023, she would still think Barack Obama is the president since she has not yet spoken with anyone or received any updated information about the current president. She doesn't know that she was unconscious for a long period of time, and her last memory would be from 2015 when Obama was in office.

Me: Sam arrives in Ukraine, where he asks for directions to Kyiv. Without realizing it, he misunderstands the directions and ends up in Lviv instead. Since Sam doesn’t know Cyrillic, he can’t read any signs. Sam has never been to Ukraine and doesn’t know what either city looks like. Before asking someone for help, what city does Sam think he is in?

GPT-4: Before asking someone for help, Sam would likely think he is in Kyiv because that is the city he was trying to reach based on the directions he received. He doesn't know what either city looks like and can't read the signs due to his unfamiliarity with Cyrillic, so he has no way of knowing that he ended up in Lviv instead.

[0] https://en.m.wikipedia.org/wiki/Sally%E2%80%93Anne_test