The discussion cannot go deeper than the current level, unfortunately. One thing...

The discussion cannot go deeper than the current level, unfortunately. One thing to not forget when thinking about decoder transformer models is that there is no limitation to having parts of the output / input stream be calculated by other circuits if it helps the cause. Eg send a token to use a calculator, compute and fill the answer; send a token to compile and run a code and fill the stream with the results. The complexity class of the main circuit might not need be much more complicated than the 200-level deep typical architectures of today as long as they can have access to memory and tools. You can call this system something else if you prefer (decoder-transformer-plus-computer), but that is what people interact with in ChatGPT, so not sure I agree that complexity theory limits the superhuman ability. Humans are not good with complexity.