Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This doesn't replicate using gpt-4o-mini, which always picks Flight B even when Flight A is made somewhat more attractive.

Source: just ran it on 0-20 newlines with 100 trials apiece, raising temperature and introducing different random seeds to prevent any prompt caching.



The newline thing is the motivating example in the introduction, using Llama 3 8B Instruct with up to 200 newlines before the question. If you want to reproduce this example with another model, you might have to increase the number of newlines all the way to the context limit. (If you ask the API to give you logprobs, at least you won't have to run mutiple trials to get the exact probability.)

But the meat of the paper is the Shapley value estimation algorithm in appendix A4. And in A5 you can see that different models giving different results is to be expected.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: