My experience using GPT4-Turbo on math problems can be divided into three cases in terms of the prompt I use:
1. Text only prompt
2. Text + Image with supplemental data
3. Text + Image with redundant data
Case 1 generally performs the best. I also found that reasoning improves if I convert the equations into Latex form. The model is less prone to hallucinate when input data are formulaic and standardized.
Case 2 and 3 are more unpredictable. With a bit of prompt engineering, they may give out the right answer after a few attempts, but most of the time they make simple logical error that can be avoided easily. I also found that multimodal models tend to misinterpret the problem premise, even when all information are provided in the text prompt.
1. Text only prompt
2. Text + Image with supplemental data
3. Text + Image with redundant data
Case 1 generally performs the best. I also found that reasoning improves if I convert the equations into Latex form. The model is less prone to hallucinate when input data are formulaic and standardized.
Case 2 and 3 are more unpredictable. With a bit of prompt engineering, they may give out the right answer after a few attempts, but most of the time they make simple logical error that can be avoided easily. I also found that multimodal models tend to misinterpret the problem premise, even when all information are provided in the text prompt.