My experience using GPT4-Turbo on math problems can be divided into three cases ...

My experience using GPT4-Turbo on math problems can be divided into three cases in terms of the prompt I use:

1. Text only prompt

2. Text + Image with supplemental data

3. Text + Image with redundant data

Case 1 generally performs the best. I also found that reasoning improves if I convert the equations into Latex form. The model is less prone to hallucinate when input data are formulaic and standardized.

Case 2 and 3 are more unpredictable. With a bit of prompt engineering, they may give out the right answer after a few attempts, but most of the time they make simple logical error that can be avoided easily. I also found that multimodal models tend to misinterpret the problem premise, even when all information are provided in the text prompt.