I like looneysquash's viewpoint about the definition of open source AI. You will need to have all parts involved open-sourced to make a model "open", not just the weights:
> The trained model is object code. Think of it as Java byte code. You have some sort of engine that runs the model. That's like the JVM, and the JIT. And you have the program that takes the training data and trains the model. That's your compiler, your javac, your Makefile and your make. And you have the training data itself, that's your source code.
> Each of the above pieces has its own source code. And the training set is also source code. All those pieces have to be open to have a fully open system. If only the training data is open, that's like having the source, but the compiler is proprietary. If everything but the training set is open, well, that's like giving me gcc and calling it Microsoft Word.
> You will need to have all parts involved open-sourced to make a model "open", not just the weights
How do you propose to opensource terabytes of web scrape text? They give you what they can give you - paper, code, model weights. You can reimplement the code, while the weights are open to do what you like with them.
> The trained model is object code. Think of it as Java byte code. You have some sort of engine that runs the model. That's like the JVM, and the JIT. And you have the program that takes the training data and trains the model. That's your compiler, your javac, your Makefile and your make. And you have the training data itself, that's your source code.
> Each of the above pieces has its own source code. And the training set is also source code. All those pieces have to be open to have a fully open system. If only the training data is open, that's like having the source, but the compiler is proprietary. If everything but the training set is open, well, that's like giving me gcc and calling it Microsoft Word.
https://news.ycombinator.com/item?id=41952722