So, I gave this to ChatGPT-4o, changing the initial part of the prompt to: "Write Python code to solve this problem. Use the code interpreter to test the code and print how long the code takes to process:"
I then iterated 4 times and was only able to get to 1.5X faster. Not great. [1]
How does o1 do? Running on my workstation, it's initial iteration is actually It starts out 20% faster. I do 3 more iterations of "write better code" with the timing data pasted and it thinks for an additional 89 seconds but only gets 60% faster. I then challenge it by telling it that Claude was over 100X faster so I know it can do better. It thinks for 1m55s (the thought traces shows it actually gets to a lot of interesting stuff) but the end results are enormously disappointing (barely any difference). It finally mentions and I am able to get a 4.6X improvement. After two more rounds I tell it to go GPU (using my RTX 3050 LP display adapter) and PyTorch and it is able to get down to 0.0035 (+/-), so we are finally 122X faster than where we started. [2]
I wanted to see for myself how Claude would fare. It actually managed pretty good results with a 36X over 4 iterations and no additional prompting. I challenged it to do better, giving it the same hardware specs that I gave o1 and it managed to do better with a 457x speedup from its starting point and being 2.35x faster than o1's result. Claude still doesn't have conversation output so I saved the JSON and had a new Claude chat transcribe it into an artifact [3]
Finally, I remembered that Google's new Gemini 2.0 models aren't bad. Gemini 2.0 Flash Thinking doesn't have code execution, but Gemini Experimental 1206 (Gemini 2.0 Pro preview) does. It's initial 4 iterations are terribly unimpressive, however I challenged it with o1 and Claude's results and gave it my hardware info. This seemed to spark it to double-time its implementations, and it gave a vectorized implementation that was a 30X improvement. I then asked it for a GPU-only solution and it managed to give the fastest solution ("This result of 0.00076818 seconds is also significantly faster than Claude's final GPU version, which ran in 0.001487 seconds. It is also about 4.5X faster than o1's target runtime of 0.0035s.") [4]
Just a quick summary of these all running on my system (EPYC 9274F and RTX 3050):
I then iterated 4 times and was only able to get to 1.5X faster. Not great. [1]
How does o1 do? Running on my workstation, it's initial iteration is actually It starts out 20% faster. I do 3 more iterations of "write better code" with the timing data pasted and it thinks for an additional 89 seconds but only gets 60% faster. I then challenge it by telling it that Claude was over 100X faster so I know it can do better. It thinks for 1m55s (the thought traces shows it actually gets to a lot of interesting stuff) but the end results are enormously disappointing (barely any difference). It finally mentions and I am able to get a 4.6X improvement. After two more rounds I tell it to go GPU (using my RTX 3050 LP display adapter) and PyTorch and it is able to get down to 0.0035 (+/-), so we are finally 122X faster than where we started. [2]
I wanted to see for myself how Claude would fare. It actually managed pretty good results with a 36X over 4 iterations and no additional prompting. I challenged it to do better, giving it the same hardware specs that I gave o1 and it managed to do better with a 457x speedup from its starting point and being 2.35x faster than o1's result. Claude still doesn't have conversation output so I saved the JSON and had a new Claude chat transcribe it into an artifact [3]
Finally, I remembered that Google's new Gemini 2.0 models aren't bad. Gemini 2.0 Flash Thinking doesn't have code execution, but Gemini Experimental 1206 (Gemini 2.0 Pro preview) does. It's initial 4 iterations are terribly unimpressive, however I challenged it with o1 and Claude's results and gave it my hardware info. This seemed to spark it to double-time its implementations, and it gave a vectorized implementation that was a 30X improvement. I then asked it for a GPU-only solution and it managed to give the fastest solution ("This result of 0.00076818 seconds is also significantly faster than Claude's final GPU version, which ran in 0.001487 seconds. It is also about 4.5X faster than o1's target runtime of 0.0035s.") [4]
Just a quick summary of these all running on my system (EPYC 9274F and RTX 3050):
ChatGPT-4o: v1: 0.67s , v4: 0.56s
ChatGPT-o1: v1: 0.4295 , v4: 0.2679 , final: 0.0035s
Claude Sonnet 3.6: v1: 0.68s , v4a: 0.019s (v3 gave a wrong answer, v4 failed to compile, but fixed was pretty fast) , final: 0.001487 s
Gemini Experimental 1206: v1: 0.168s , v4: 0.179s , v5: 0.061s , final: 0.00076818s
All the final results were PyTorch GPU-only implementations.
[1] https://chatgpt.com/share/6778092c-40c8-8012-9611-940c1461c1...
[2] https://chatgpt.com/share/67780f24-4fd0-8012-b70e-24aac62e05...
[3] https://claude.site/artifacts/6f2ec899-ad58-4953-929a-c99cea...
[4] https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...