Where does OpenAI state that they only train ChatGPT-3.5 or GPT-4 on code from GitHub? The model for GitHub Copilot X clearly has a (human) language understanding that you can't get from source code (or source code comments), so they are trained on much more data than GitHub has and there is no reason to believe OpenAI would limit themselves to that.