Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
jumpCastle
on June 18, 2023
|
parent
|
context
|
favorite
| on:
The Secret Sauce behind 100K context window in LLM...
It is fine tuned to maximize reward though, not likelihood. And it provides an answer in both cases, just not as well.
stoniejohnson
on June 18, 2023
[–]
So since a model is fine tuned via RLHF my point doesn't stand?
Genuine question; it would be interesting if some other mechanism was at play here.
jumpCastle
on June 18, 2023
|
parent
[–]
For an answer I would expect it to get the same reward for both question orderings. So naively I would expect it to not be affected by the ordering.
Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: