Hacker News new | past | comments | ask | show | jobs | submit login

My point was more so that this task is so trivial, that's it's not testing the model's ability to distinguish contextual nuances, which would supposedly be the intention.

The idea presented elsewhere in this thread about using an unpublished novel and then asking questions about the plot is sort of the ideal test in this regard, and clearly on the other end of the spectrum in terms of a design that's testing actual "understanding".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: