Hacker News new | past | comments | ask | show | jobs | submit login

I always wonder how people figure out these successful gigantic models if it takes hundreds of TPUs and days to train them.

I recently bought rtx 3090 in hopes of playing around with some computer vision applications but I guess having 24GB VRAM is nothing if I want to get something SOTA working.




The EfficientNet paper has some good things to say on this.

If you're working at a place with giant datacenters full of (T/G)PUs, you can train one giant model a few times, or train smaller models hundreds of times. Without hyperparameter search, there's a really high chance that you're just looking in the wrong region and wind up with something gigantic but kinda meh.

So, the simple strategy is to use the smaller models to find a great mix of hyperparameters, and then scale up to a gigantic model. The EfficientNet paper demonstrates some fairly reliable ways to scale up the model, changing width and depth together according to a scaling factor.

But yeah, even for smaller model footprints, the ability to run tens of experiments in parallel goes a very long way. If you've got a single GPU to play with, I would instead try to focus on a well-scoped interesting question that you can answer without having to demonstrate SOTA-ness, as it will be an uphill climb.

Also remember that it's good to lean heavily on pre-trained models to save time. Anything you can do to iterate faster, really.


The RTX 3090 is a beast compared to what researchers had available to them just a few years ago.

Don't try to chase SOTA - that's a fruitless endeavour.

24GB of VRAM is plenty for CV and you can train some excellent models with it. You also need to keep in mind that you don't necessarily need to train models from scratch either.

You can achieve great things by downloading a well-tested, pretrained model and fine-tune it for your particular task or application. Trying to come up with new models and training them from scratch is an exercise in futility for really big models.

I usually only train smaller models (couple of million parameters) and training and finetuning usually takes anywhere from a few hours to a day or two. But then again my hardware is two generations older than yours.


The real research problem is being able to buy a 3090.


So many papers casually mention their hyperparameters, neglecting to mention that those specific numbers are often necessary for performance. Something you don't realize unless you play around with their code…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: