Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My intuition is that the harder it is for an LLM to do something during training the more actual compression/learning will be encoded in it's weights. With multi-token/diffusion it becomes much easier to "reward/loss hack" your way, this won't matter much during pretraining, but I assume a lot of "cheating" will happen in the finetune/RL phase.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: