Hacker News new | past | comments | ask | show | jobs | submit login

I include the link to the Colab, which means it's trained for free on Google's machines, and you just access it from your browser.

Of course, you might not want to have sensitive data on Google's machines for one reason or another, in which case you'd have to buy an external GPU, or better yet a whole other machine.




Training the smallest GPT-2 model uses about 11-12GB of GPU VRAM; consumer GPUs cap out at about 8GB.

GPT-2 1.5B will definitely not train on a consumer GPU.


You can't train the full thing, but you can freeze everything except the transformer layers (which is what shawwwn and gwern do anyway even though they do have the memory). You also need gradient checkpointing of course.


Can anything be done on a mobile device yet?


Yes, there are a lot of modells designed to work okay on mobile. Though you'd typically train in the cloud and only use the trained model on the phone. Alternatively, you can train over many phones, which brings a lot of extra challenges but is definitely possible.

Google's very new Reformer[0] would likely be your best bet if you want both something truly cutting-edge and have less compute, even as little as a mobile's. As far as I know, it hasn't been used on phones yet (again, it's very new) but I bet it can be done.

0. https://ai.googleblog.com/2020/01/reformer-efficient-transfo...


Interesting! Thank you for the link.

I don’t mind training on a desktop and use it on both desktop and mobile. We kinda already have that problem since we parse Google data for a given android phone, but it doesn’t have the memory or compute for the amount of data the phone has generated over the years. The user will background the app too quickly. So we need to ask the desktop app to do it, process there, and sync results back.


Note that on the extreme end of consumer GPUs, there is the 2080 Ti which comes with 11GB.


Is it possible to slice the model up between multiple GPUs?


Yeah, I don’t want to upload.

I would really like to have my app learn the user’s speaking style from their data and be able to write out diary entries each day in their own “voice”.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: