I include the link to the Colab, which means it's trained for free on Google's m...

minimaxir · on Jan 23, 2020

Training the smallest GPT-2 model uses about 11-12GB of GPU VRAM; consumer GPUs cap out at about 8GB.

GPT-2 1.5B will definitely not train on a consumer GPU.

Tenoke · on Jan 23, 2020

You can't train the full thing, but you can freeze everything except the transformer layers (which is what shawwwn and gwern do anyway even though they do have the memory). You also need gradient checkpointing of course.

sroussey · on Jan 23, 2020

Can anything be done on a mobile device yet?

Tenoke · on Jan 23, 2020

Yes, there are a lot of modells designed to work okay on mobile. Though you'd typically train in the cloud and only use the trained model on the phone. Alternatively, you can train over many phones, which brings a lot of extra challenges but is definitely possible.

Google's very new Reformer[0] would likely be your best bet if you want both something truly cutting-edge and have less compute, even as little as a mobile's. As far as I know, it hasn't been used on phones yet (again, it's very new) but I bet it can be done.

0. https://ai.googleblog.com/2020/01/reformer-efficient-transfo...

sroussey · on Jan 23, 2020

Interesting! Thank you for the link.

I don’t mind training on a desktop and use it on both desktop and mobile. We kinda already have that problem since we parse Google data for a given android phone, but it doesn’t have the memory or compute for the amount of data the phone has generated over the years. The user will background the app too quickly. So we need to ask the desktop app to do it, process there, and sync results back.

cyorir · on Jan 23, 2020

Note that on the extreme end of consumer GPUs, there is the 2080 Ti which comes with 11GB.

CMCDragonkai · on Jan 24, 2020

Is it possible to slice the model up between multiple GPUs?

sroussey · on Jan 23, 2020

Yeah, I don’t want to upload.

I would really like to have my app learn the user’s speaking style from their data and be able to write out diary entries each day in their own “voice”.