This is so fun. A question for you (or anyone else familiar with this topic), what hardware you would recommend for someone just getting into training GPT2 models? Would a Radeon RX 580 be enough?
You cannot train any GPT-2 models with an AMD GPU. Nvidia's CUDA is still the de facto toolkit.
Either use Colab (free), or a preemptible GPU instance on GCE w/ the Deep Learning VM image (relatively cheap). Using consumer GPUs is a recipe for frustration.
>You cannot train any GPT-2 models with an AMD GPU.
It seems like you can. I know of at least one person who has finetunned 1.5b on a 16GB AMD. I think u/sillysaurusx had some part in it, but apparently translating the code from CUDA was fairly easy.