They all are using these tests to determine their worth, but to be honest they don't convert well to real world tests.
For example I tried Deepseek for code daily over a period of about two months (vs having used ChatGPT before), and its output was terrible. It would produce code with bugs, break existing code when making additions, totally fail at understanding what you're asking etc.
For example I tried Deepseek for code daily over a period of about two months (vs having used ChatGPT before), and its output was terrible. It would produce code with bugs, break existing code when making additions, totally fail at understanding what you're asking etc.