>Every other LLM I've tried include o3-mini-high: Fill the 12-liter jug completely. Pour it into the 6 liter jug.
Try it with a 12L jug and a 4L jug and ask for 4L. See if it tells you to just fill the 4L or to fill the 12L and pour into the 4L twice discarding both times and there will be 4L remaining in the 12L jug.
Even though it's still technically correct, it demonstrates that there's no real "reasoning" happening just regurgitation of training data.
Try it with a 12L jug and a 4L jug and ask for 4L. See if it tells you to just fill the 4L or to fill the 12L and pour into the 4L twice discarding both times and there will be 4L remaining in the 12L jug.
Even though it's still technically correct, it demonstrates that there's no real "reasoning" happening just regurgitation of training data.