Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We have 20+ services in prod that use llms. So I have 50k (or more) per service per day of data to evaluate. The question is- do people actually evaluate properly.

And how do you do an apples to apples evaluation of such squishy services?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: