> whole of English Wikipedia baked into them (IIRC it constitutes the bulk of the training data for pretty much all of them)
Not a dig on anything you are saying (because I agree that just shoving a link into an LLM and asking for a summary is a horrendous stand-in for learning), but worth correcting that wikipedia is a very small fraction (certainly under 1%) of the training corpus for LLMs these days.
Not a dig on anything you are saying (because I agree that just shoving a link into an LLM and asking for a summary is a horrendous stand-in for learning), but worth correcting that wikipedia is a very small fraction (certainly under 1%) of the training corpus for LLMs these days.