Is there any intuition why does it even work? It seems very unexpected. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		xpl on Dec 17, 2024 \| parent \| context \| favorite \| on: New LLM optimization technique slashes memory cost... Is there any intuition why does it even work? It seems very unexpected.

cs702 on Dec 17, 2024 [–]

The intuition is that the relative frequency at which past tokens get attention from future tokens is a good proxy for their relative importance.

The model the authors use, in fact, maps attention scores to features in the frequency domain.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact