Yes, we have already submitted the model for evaluation on the Spider holdout test set. While your suggestion is certainly intriguing, implementing a universal solution could be quite challenging, as it would heavily depend on the specifics of the dataset.
I don’t think it’s necessarily about a “universal” solution, just “better”. Make the column names more verbose, changing numeric enums to text ones, disambiguating column names, etc. One of the spider datasets is a stadium table and one of the column names is “average”, which means average capacity, but it’s super ambiguous. If you asked an LLM to “make these table columns more verbose” I bet it would call that “average_capacity” and all of the sudden some NLQ queries that confused the function and the column name would start to work.