Anyone got a TL;DR on this?

vidarh · on July 1, 2024

Think about an AI with a 1 bit model. If you feed that AI data that can't possibly be classified into less than 2 bits, it can't get it precisely right, no matter how much data you train it on, or what the 1 bit of the model represents.

For any given size of system, there will be a ceiling on what it can learn to classify or predict with precision.

I used "system" rather than "model" there for a reason:

Memory in any form, such as context and RAG or API access to anything that can store and retrieve data affects the maximum - a turing machine can be implemented with a very small model + a loop if there's access to an external memory to act like the tape. But if the "tape" is limited, there will be some limitation on what the total system can precisely classify.