Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SepLLM: Accelerate LLMs by Compressing One Segment into One Separator (sepllm.github.io)
39 points by limoce 10 months ago | hide | past | favorite | 2 comments


This paper seems like it misses the forest for the trees. The analysis is certainly interesting and the proposal sounds viable, sort of like a sliding window attention with a little more history.

But if it is true that the separators contribute the most towards the attention scores, wouldn't that imply that the tokenization scheme can be improved? Introducing a compression scheme seems like patching around that compared to if the model naturally generated a more random attention distribution.


Or, put another way:

'Why waste time say lot token when few token do trick?"

-Kevin Malone




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: