Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems he brushes over the positional encoding, which for me was the most puzzling part of transformers. The way I understood it, positional encoding is much like dates. Just like dates, there are repeating minutes, hours, days, months...etc. Each of these values has shorter 'wavelength' than the next. The values are then used to identify the position of each tokens. Like, 'oh, im seeing january 5th tokens. I'm january 4th. This means this is after me'. Of course the real pos.encoding is much smoother and doesn't have abrupt end like dates/times, but i think this was the original motivation for positional encodings.


That's one way to think about it.

It's clever way to encode "position in sequence" as some kind of smooth signal that can be added to each input vector. You might appreciate this detailed explanation: https://towardsdatascience.com/master-positional-encoding-pa...

Incidentally, you can encode dates (e.g. day of week) in a model as sin(day of week) and cos(day of week) to ensure that "day 7" is mathematically adjacent to "day 1".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: