Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think NATTEN does not support cross attention, wonder if the authors have tried any text-conditioned cases? Does the cross-attention can only add to regular attention? Or added through adanorm?


cross-attention doesn't need to involve NATTEN. there's no neighbourhood involved because it's not self-attention. so you can do it the stable-diffusion way: after self-attention, run torch sdp with Q=image and K=V=text.

I tried adding "stable-diffusion-style" cross-attn to HDiT, text-conditioning on small class-conditional datasets (oxford flowers), embedding the class labels as text prompts with Phi-1.5. trained it for a few minutes, and the images were relevant to the prompts, so it seemed to be working fine.

but if instead of a text condition you have a single-token condition (class label) then yeah the adanorm would be a simpler way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: