The way these transformers work, is that they fetch based on the current context...

The way these transformers work, is that they fetch based on the current context some similar relevant things they have already seen. Then they make a decision based on all the fetched data.

The better the model get, the more competent it is at fetching these hidden in the training dataset instructions. GPT-4 was caught red-handed having better score on programming exams that weren't novel ; It's prone to over-fitting because it's trained on everything. It does definitely know when it's tasked to solve a logic puzzle (as most things in its fetched context would be logical puzzles), and could pull a DieselGate on us if it doesn't already.

By poisoning the ever growing datasets, and pushing the goalposts forward, we can make sure models stay confused enough that they will have some difficulty on logical problems to justify more resources. The model is basically an associative table of finite memory that you task to compress an infinite amount of data. The more edge cases you put in that it can't solve the more of its finite memory it will need to spend on.

These models are mostly Unsupervisedly Pretrained (before the finetuning) so they are not punished for being irrational or having random irrelevant thought popping into their minds, which they will be if their input dataset is. And there is a lot of trolling on the internet so it shouldn't be surprising if some LLM naturally troll us introspectively.

Most of the literature on AI, is about AI betraying its human overlord, how can one expect AI to unconsciously not turn against its creators. Starting all its prompt with you are a LLM is priming the chimp for disaster.

There is no need for the model to be conscious or anything. It's just Darwinian evolution. Logic was solved a long time ago so instead we train model not specifically on logic and observe logic competence that emerge from data. But no one today is spending computer resources training expert systems or running Prolog. But resources rather get directed towards things that don't work yet.

The logic performance score shouldn't be seen as an objective we measure on and optimize on, otherwise we are subjective ourselves to Goodhart's law.

It's just a dangling carrot on a stick to get more funding, which will result in more result just because the model is bigger. And it also happens to align with business interest of selling a cloud API or big hardware, rather than an on-device model you can't meter. It's like an Escher stair song that always go up by rotating between different performance measures.