It wasn't clear to me what evaluation method was being used, the chart in the bl... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

tmostak on Aug 31, 2023 | parent | context | favorite | on: Fine-tuning GPT-3.5-turbo for natural language to ...

It wasn't clear to me what evaluation method was being used, the chart in the blog says Execution Accuracy, but the numbers that seem to be used appear to correlate with "Exact Set Match" (comparing on SQL) instead of the "Execution With Values" (comparing on result set values). For example, DIN-SQL + GPT-4 achieves an 85.3% "Execution With Values" score. Is that what is being used here?

See the following for more info:

https://yale-lily.github.io/spider https://github.com/taoyds/spider/tree/master/evaluation_exam...

MrezaPourreza on Sept 1, 2023 [–]

Hello, thank you very much for your meticulous comment. The 85.3% accuracy reported in our paper (I'm one of the authors of the DIN-SQL paper) pertains to the test set. However, in the blog post, we are reporting the performance on the development set, which stands at 74.2%.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact