Hacker News new | past | comments | ask | show | jobs | submit login

I would assume is something similar to joining multiple frames/attentions? in channel dimension and then moving values inside so convolution will have access to some channels from other video frames.

I was working on similar idea few years ago using this paper as reference and it was working extremely well for consistency also helping with flicker. https://arxiv.org/abs/1811.08383




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: