"Robustness of Model-Graded Evaluations and Automated Interpretability" (2023) https://www.lesswrong.com/posts/ZbjyCuqpwCMMND4fv/robustness... :
> The results inspire future work and should caution against unqualified trust in evaluations and automated interpretability.
From https://news.ycombinator.com/item?id=37451534 : add'l benchmarks: TheoremQA, Legalbench
"Robustness of Model-Graded Evaluations and Automated Interpretability" (2023) https://www.lesswrong.com/posts/ZbjyCuqpwCMMND4fv/robustness... :
> The results inspire future work and should caution against unqualified trust in evaluations and automated interpretability.
From https://news.ycombinator.com/item?id=37451534 : add'l benchmarks: TheoremQA, Legalbench