I noticed CUDA 11.0 was almost ready for release last week when I went to install CUDA and the default download page linked to the 11.0 Release Candidate. The 10.1 and 10.2 links were buried behind a link off to the side labeled "legacy". The thing is, no library you use is going to be supporting the CUDA 11.0 RC, that's ridiculous. For example, Pytorch stable is on 10.2 and Tensorflow only goes up to 10.1.
This is generally indicative of how poorly organized the CUDA documentation and installation instructions are. The Conda dependency manager has made this a lot easier recently. Especially by, e.g., providing pytorch binaries. Though if you want to use packages like NVIDIA Apex for mixed precision DL[0] you're going to be in for a huge headache trying to compile torch from source while also managing your cuda and nvcc version, which sometimes must be the same but sometimes can not be![1]
[0] Yes, I'm aware that Apex was very recently brought into torch but it seems that the performance issues haven't been ironed out yet.
Yeah, and the CUDA 10.0 official Visual Studio demo project build was broken for... looks like a year, at least, because they didn't want to populate the toolkit path. NVidia, you're better than this.
> The Conda dependency manager has made this a lot easier
Yeah but conda is "Let's do dependency management with a SAT solver, it'll be great!" On a good day, it's just slow. On a bad day, the SAT solver spins for hours before failing to converge. On a really bad day, the SAT solver does something "clever."
I've had a couple of really bad days this year. I'm really starting to not like conda very much.
My biggest gripe with the conda depencency manager is that it doesn't keep track of which packages own which files, and if multiple packages own the same file the last one to be installed will happily scribble over whatever was there before. With hilarious results, of course.
This means that keeping a conda installation up to date is often very tricky, when upgrading you frequently have to uninstall and reinstall some packages.
It works better if you start from scratch with a requirements.yml file.
Conda's SAT solver for dependency management is the bane of my existence. For pip-installable packages, I'll almost always turn to pip rather than conda even when in a conda environment.
I thought the debian SAT solver was a maintainer tool rather than something that ran every time? In any case, conda's implementation is really quite awful by comparison and they would have been well served by copying something that works instead of building something that doesn't.
I have to use containers with nvidia-docker because NVIDIA so consistently and relentlessly breaks things without so much as a glance at backward compatibility.
I moved our Deep Learning servers over to Docker images + JupyterHub DockerSpawners recently because maintaining all the various version dependencies between frameworks was an absolute PITA.
I'm never sure of the relation between the driver, nvidia-docker and the container with a specific cuda version.
Last time I tried it the cuda inside the container tough it was using some old driver version while a much newer version was installed on the host. So I had to manual install the older version, not sure where the issue was but maybe it was because I was using the deprecated nvidia-docker version 2 which is still needed to pass gpu resources to containers run inside kubernetes.
The annoying thing is that nvidia-docker is still not great. You still have to deal with the driver installed outside the container, and it makes a big difference.
Furthermore it seems like even the CUDA runtime is typically not installed in the container, but rather injected in by the nvidia-docker container runtime.
You don't have to use nvidia-docker to use cuda with docker. I made my own cuda containers based on Debian and pass the devices to the docker run command. I mount the libcuda and libnvidia libraries as volumes. I think that's what you mean by injecting the runtime.
We use Singularity as our container provider for the exact same reason. For now it has worked great and driver/CUDA updates haven't broken any containers yet over the past 2 years.
Something good about Singularity (which I bet you could also do with Docker) is that it automatically binds the right NVIDIA stuff into the container. It also works fine unprivileged :)
Yeah. To make the matter worse, they have updated libnccl-dev from their apt repo to be CUDA-11 based a few weeks ago. That breaks my CUDA app (because it is still on 10.2 and not compatible) in interesting ways. apt-hold libnccl-dev for a while and waiting for this release.
Everytime I have to deal with multiple versions of CUDA on Linux I feel like poking my eyes out. I get that supporting developer libraries that have to interact with hardware is hard but come on...
For something this popular it shouldn't be so hard. I don't think being related to hardware is an excuse. CUDA is not a driver and exists entirely in userspace.
This is the kind of thing that happens when you're dealing with a monopoly.
The economic incentive is simple: open-sourcing the driver will allow an open-source API to interact with the hardware, allowing AMD/other competitors to support the same API. So instead of competing at the silicon level, Nvidia chooses to set up unnecessary barriers to entry at massive cost to developers/users.
unnecessary? not in the eyes of the shareholders. just compare NVIDIAs stock surge with the performance of AMD. they protect their market and they do it pretty well.
the Linus video is awesome though :-) And I totally understand his sentiment
Yes - I would think drivers are even harder from an engineering point of view but as far as I know they have fairly good backwards compatibility for games. I think this is likely because people would be much more reluctant to buy new graphics cards if they broke their older games.
Does anyone understand why such minor upgrades resulted in a major version bump? Is this some sort of stability check point? Or some other versioning convention?
It supports a whole new architecture (Ampere) and all the good stuff that comes with it: Multi-Instance GPU partitioning, new number formats (Tfloat32, sparse INT8), 3rd gen of Tensor Cores, and asynchronous copy/asynchronous barriers. These are huge features.
Well, I think a new microarchitecture means a major bump. So between that and version bumps to to actual major software features, you get to 11 within 13 years or so.
Also, GCC 9.x compatibility may seem minor to some, but is significant for others. I also think there's some C++17 support in kernels - that's something too.
Ooh, I missed those. Support for C++17 is pretty major. Thanks. Perhaps my memory is fuzzy, I just remember the CUDA 9->10 switch having some significant (but not major) performance and feature changes.
I've got confused for a sec on "removing Pascal support", as some of the 10XX GPUs are only 2 years old. Looks like Pascal stays, and Maxwell is removed indeed (it was deprecated in 10.2).
You just have to compile that specific supported version of gcc. On an earlier CUDA, I had to compile gcc-4.9. Unlike Debian, Fedora just seems to remove all traces of old packages.
They've come to the conclusion that you should also come to, that Fedora is basically a waste of time to support because whatever you've gotten working will be terribly broken in the next release for no good reason anyone can point to?
That probably just means that they are testing that they are testing that platform in CI now, and are officially supporting it (taking bug reports into account, etc.).
This has been problematic for me after upgrading from 18.04 to 20.04. Every time I apt update cuda or some nvidia package my x server fails to start for several different reasons.
Just today I've already spent 30 minutes trying to start x with this latest cuda update. Too bad I can't switch back to the open source nouveau driver.
Coincidentally I was just looking at this after upgrading from 19.10 to 20.04 today -- basically, the version installed from Nvidia's old 18.04 repos work fine on 20.04.
I'd like to play with CUDA, but I just got a new laptop without an Nvidia GPU, coming from one that had a built in Nvidia GPU. It's got a thunderbolt port, but unfortunately most of the gpu's are quite expensive at around 400$. Does anyone know any cheaper options?
If you just want to try for a few hours, you can add GPU(s) onto a GCP CE instance. Along with the trial credits it should get you a few hours poking around with cuda.
Otherwise, get a pre-owned GTX950 (one that doesn't require external power supply) and a TB3 to PCI-E x16 adapter. Not enclousure, adapter. Should cost you around $200 all in IIRC. And it allows you to upgrade the card furthur down the line since most of the cost is the adapter.
Do you know what exactly I should search for when looking for a "TB3 to PCI-E x16 adapter"? Will this utilize all thunderbolt lanes available? I've got a newer laptop that, I believe, has all lanes available.
For tinkering around, just use Googe Colab[1]. They offer free hosted Jupyter notebooks and have both Nvidia GPU[2] and Google TPU[3] runtime options available.
Here[4] is a notebook that shows how to install CUDA into an environment using the GPU accelerated runtime.
Only major downside is that resources aren't guaranteed (see first section under "Resource Limits" here[5]), so you sporadically may not be able to start a GPU-accelerated runtime session. But that shouldn't be much of a blocker for tinkering purposes.
oh great. More chasing to do. Anyone interested in working on the CUDA integration for Go (https://gorgonia.org/cu)? PRs welcome, as I am quite short on time.
Apple doesn't directly compete with CUDA, they just want total control of their platforms. Metal does have great performance and tooling. In practice nobody does HPC on macs so there's no demand for linear algebra or graph libraries, which are a big selling point for Nvidia over AMD.
The fact that Apple is trying to kill OpenGL and OpenCL and block Vulkan definitely sucks though for anyone trying to do indie games, or open source ML/HPC.
Have you looked at OpenGL api's, they are a strange legacy beast. I think Apple does what's best for it's users, even if it has to create a new standard. How about other companies adapting to modern Metal API's instead?
No, you need to use Metal, or Vulkan + Molten VR, both of which suck for compute, but you don't have that much compute available on Apple hardware anyways, so it shouldn't matter that much.
This is generally indicative of how poorly organized the CUDA documentation and installation instructions are. The Conda dependency manager has made this a lot easier recently. Especially by, e.g., providing pytorch binaries. Though if you want to use packages like NVIDIA Apex for mixed precision DL[0] you're going to be in for a huge headache trying to compile torch from source while also managing your cuda and nvcc version, which sometimes must be the same but sometimes can not be![1]
[0] Yes, I'm aware that Apex was very recently brought into torch but it seems that the performance issues haven't been ironed out yet.
[1] https://stackoverflow.com/questions/53422407/different-cuda-...