For handling cross-reference clues, do you think it would be feasible in the future to feed the QA model a representation of the partially-filled puzzle (perhaps only in the refinement step - hard to do for the first step before you have any answers!), in order to give it a shot at answering clues that require looking at other answers?
It feels like the challenges might be that most clues are not cross-referential, and even for those that are, most information in the puzzle is irrelevant - you only care about one answer among many, so it could be difficult to learn to find the information you need.
But maybe this sort of thing would also be helpful for theme puzzles, where answers might be united by the theme even if their clues are not directly cross-referential, and could give enough signal to teach the model to look at the puzzle context?
It feels like the challenges might be that most clues are not cross-referential, and even for those that are, most information in the puzzle is irrelevant - you only care about one answer among many, so it could be difficult to learn to find the information you need.
But maybe this sort of thing would also be helpful for theme puzzles, where answers might be united by the theme even if their clues are not directly cross-referential, and could give enough signal to teach the model to look at the puzzle context?