Just a note I think it's 3.5 for the code work. 4 would probably be prohibitively expensive to run and they carefully mention that they use 4 for the PRs and a few other bits but not the code gen - there they just talk about chatgpt. I'd love to be wrong about this.