It is fine tuned to maximize reward though, not likelihood. And it provides an a... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

jumpCastle on June 18, 2023 | parent | context | favorite | on: The Secret Sauce behind 100K context window in LLM...

It is fine tuned to maximize reward though, not likelihood. And it provides an answer in both cases, just not as well.

stoniejohnson on June 18, 2023 [–]

So since a model is fine tuned via RLHF my point doesn't stand?

Genuine question; it would be interesting if some other mechanism was at play here.

jumpCastle on June 18, 2023 | [–]

For an answer I would expect it to get the same reward for both question orderings. So naively I would expect it to not be affected by the ordering.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact