I have a hard time about the "cannot reproduce" categorization. There are places...

littlestymaar · 2024-04-18T21:19:31.000000Z

But with generated code what you end up with is still code, that can be edited by whoever needs. If AMD stopped maintaining their drivers then people would be maintaining the generated code, it wouldn't be a nice situation but it would work, whereas model weights are akin to the binary blobs you get in the Android world, binary blobs that nobody call open-source…

pama · 2024-04-19T04:10:59.000000Z

I personally think that the model artifacts are simply programs with tons of constants. Many math routines have constants in their approximations and I don’t expect the source to include the full derivation for these constants all the time. I see LLMs as a same category but with (much) larger sets of parameters. What is better about the LLMs than some of the mathematical constants in complicated function approximations, is that I can go and keep training an LLM whereas the math/engineering libraries might not make it easy for me to modify them without also figuring out the details that led to those particular parameter choices.