topwalktown's comments

topwalktown · on Feb 28, 2025

Transformers like Llama use rotary embeddings which are applied in every single attention layer

https://github.com/huggingface/transformers/blob/222505c7e4d...

markisus · on March 1, 2025

Very interesting! Do you know if there were any studies about whether this improves performance?

topwalktown · on July 6, 2024

I'm trying to train a variable resolution ViT using IJEPA. I'm currently topping out at about 30% on imagenet1k after training for 20 epochs (6 hours)

It'd be cool to have some help and feedback. I'm on the right track to getting really killer setup that is super fast to train it needs more evaluations and more tuning. Anyone interested?

topwalktown · on Feb 21, 2024

i experiment with using a (mostly) unmodified llama model to generate images, by training on the bits from a lossy compression algorithm. It turns out the key is having a decoder which can give 'hints' as conditioning information for the autoregressive model, about what the decoder is going to do with the next token in the stream

Thanks!

topwalktown · on Feb 1, 2024

yeah, check out the Emu paper by meta. They basically do all of what is mentioned in the above comment

topwalktown · on Nov 29, 2023

Quantization also works as regularization; it stops the neural network from being able to use arbitrarily complex internal rules.

But really it's only really useful if you absolutely need to have a discrete embedding space for some sort of downstream usage. VQVAEs can be difficult to get to converge, they have problems stemming from the approximation of the gradient like codebook collapse

topwalktown · on Nov 6, 2023

I wrote a short article about jpg and if we could use concepts from how jpg works to make an image autoencoder that has a left-to-right positional bias and variable compression

Basically, existing VAEs are pretty good at compression, but have bad properties like 2D latent position bias and difficulty training on batches of mixed resolutions

So I try something I call DCT-Autoencoder, which takes ideas from JPG to learn compression of patched DCT features of an image

Check it out!