Code generation, I wonder the difference in output given order of operations with training and fine tuning. What if the model was trained on the documentation and the code base for Python as an example.
Then fine tuning came from training on actual python code on GitHub.
At the model understands the python documentation and the implementation standard library/interpreter. Then is there a reduction of data needed for code generation therefore reducing the size of the data set used for code generation?
I do wonder if anyone is considering mixing in larger and larger percentages of The Stack https://huggingface.co/datasets/bigcode/the-stack with this or the Pile to get more code and see what happens.
Then fine tuning came from training on actual python code on GitHub.
At the model understands the python documentation and the implementation standard library/interpreter. Then is there a reduction of data needed for code generation therefore reducing the size of the data set used for code generation?