That wasn't what he used the word for. I understood his point perfectly: there are AI teams that are not knowledgeable or skilled enough to modify and enhance the docker images or toolkits that train/run the models. It takes some medium to advanced skills to get drivers to work properly. He used shorthand "too stupid to" instead of what I wrote above.
Still, it adds an air of arrogance to the whole post. For a while the only pytorch code that worked on newly released hopper GPUs we had was the Nvidia ngc container, not Pytorch nightly. The upstream ecosystem hadn't caught up yet and Nvidia were adding their special sauce in their image. Perhaps not stupidity but lack of docs from nvidia
> For a while the only pytorch code that worked on newly released hopper GPUs we had was the Nvidia ngc container, not Pytorch nightly. The upstream ecosystem hadn't caught up yet and Nvidia were adding their special sauce in their image.
I'm sorry to come across as arrogant, but it's really just frustration, because being surrounded by this kind of cargo-culting "special sauce" talk, even from so-called principal engineers, is what drove me to burnout and out of the industry into the northwoods. Furthermore, you're completely wrong. There is no special sauce, you just didn't look at the list of ingredients. There never has been any special sauce.
The build scripts for the base container are incredibly straightforward: they add the apt/yum repos and then install packages from that repo.
The pytorch containers are constructed atop these base containers. The specific pytorch commit they use in their NGC pytorch containers are directly linked in their release notes for the container: https://docs.nvidia.com/deeplearning/frameworks/pytorch-rele...
Do I need to keep going? Every single one of these commits is on pytorch/pytorch@main. So when you say:
> For a while the only pytorch code that worked on newly released hopper GPUs we had was the Nvidia ngc container, not Pytorch nightly.
That's provably false. Unless you're suggesting that upstream pytorch continually rebased (eg: force pushed, breaking the worktree of every pytorch developer) atop unmerged code from nvidia, the commit ishes would not match. Meaning all of these commits were merged into pytorch/pytorch@main, and were available in pytorch nightlies, prior to the release of those NGC pytorch containers. No secret sauce, no man behind the curtain, just pure cargo culting and superstition.