Most research computing in CS department, and much of our instruction, uses GPUs with Nvidia’s Cuda software, and applications such as Pytorch and Tensorflow. This page describes these technologies
GPUs and their allocation
Most of our research and larger instructional systems have 8 Cuda-capable Nvidia GPUs. Desktop systems generally have one smaller GPU that is still Cuda-capable, and thus could be used for courses in GPU programming or preliminary software development.
GPUs can be shared by more than one user. However the memory is limited (typically 12 GB on public systems), so in practice only one or two can use them at a time. To avoid having one user dominate, on the ilab systems software assigns 4 GPUs to you when you login. For public systems (primarily ilab1 – ilab4), please do not use more than one system unless it’s evident the no one else is using them.
The Cuda Toolkit is a set of APIs from Nvidia, designed to make it easy to write programs using GPUs. Among other things, it provides a uniform interface that applies to many different models of GPU. There are alternatives, e.g. OpenCL, but Cuda is used most commonly here. Cuda has bindings for Python and most other major programming languages. Work in our department uses primarily Python.
We install the latest version of Cuda on all systems with appropriate GPUs. Many users have existing code that requires older versions. You can ask email@example.com to install previous versions on your system. However this may not be possible. E.g. Ubuntu 20 supports Cuda 11, but no older versions. See the section on containers, below, for a way to use older versions of software.
In this department, GPU-based work is done primarily in Python. We would be happy to support users with other languages, but most tools currently installed are for Python. The most common tool is Pytorch, but we also have some usage of Tensorflow.
Pytorch, Tensorflow, and other tools are installed in anaconda environments. DO NOT simply type “python.” On many systems that will give you an older version of Python, without access to the GPU-related tools. Instead use the most recent version of anaconda that you can. Anaconda is a packaged environment for Python. It has most of the major tools. We add additional ones on request to firstname.lastname@example.org.
The Anaconda environments are located in /koko/system/anaconda/envs/. Currently we have python36 through python38. But we’ll be adding new versions as they become available, type
conda env list to see the current list.
To pick your version, see Using Python on CS Linux Machine. For most purposes, it’s sufficient to add the appropriate environment to your path, e.g.
Adding your own software
We have tried to put all the commonly-used software in our Anaconda environments. If you need more, there are two options:
- Install individual packages using
pip install --userThat causes them to be installed in your home directory, in
~/.local/lib/pythonM.NThis is a reasonable approach if you have a few packages you need.
- Install you own python enviornment, either using a venv. or your own Anaconda distribution. Because Anaconda distributions are large, and our home directories have limited quotas, you may want to put an Anaconda distribution in either
/common/home/NETID(if you’re a grad student or faculty).
- Install individual packages using
As noted above, you may need to use a version of Conda, pytorch, or other software than what we have installed. For this we recommend using a container. A container is in some respects like a virtual machine. It has its own set of software. But it’s not as isolated from the underlying operating systems. It has the same users, processes, and user file systems. It is really a way of delivering a specific set of software that is different from what is installed on the main system.
Nvidia supplies official containers that have Cuda, Pytorch, Tensorflow, and many other tools. They issue new containers once a month, but keep the old ones archived. That lets you get most reasonable combinations of versions by running in an old container. If Nvidia comes out with new software that we haven’t installed yet, you may also be able to use a container to run a newer version.
Because older versions of Cuda won’t install on Ubuntu 20, we’re initially suggesting this to get Cuda 9 or 10 on Ubuntu 20. However in the long run we’ll probably use containers more widely.
We have downloaded the Nvidia containers that we think you’d be most likely want, to
/koko/system/nvidia-containers. In that directory there are also INDEX files listing all available containers and the versions of major software they support. If you need a container that we haven’t provided, we can easily download it.
In addition to downloading them, we have converted them from Docker to Singularity .sif files. We recommend singularity rather than Docker, because singularity gives you roughly the same environment inside the container as outside. Docker is harder to work with. Singularity was created for the HPC community, and is also used by the Rutgers HPC center (OARC).
To use a container, simply do
singularity run --nv CONTAINER. You’ll get a bash shell, with all of your normal file systems available. It’s much like being in a separate virtual machine, except that we don’t have to dedicate specific resources to it. But the software is the versions listed in the INDEX file, or in more detail, in the release notes that the INDEX file points to. The
--nv (Nvidia) option is needed to get access to GPUs in the container.
You can install additional python software for the container as described above. Because your home directory and /common are the same inside the container as outside, it works just the same. The containers all have a specific version of python installed, designed to work with the versions of pytorch or tensorflow. However if you’re using the container just to get an old version of Cuda, you can certainly use one of the versions of python from /koko/system, as discussed above.
These containers have software intended to run code. They may not have everything you want for development. (In particular, there’s no emacs editor.) Thus you may want to maintain a separate window on the main machine to do things other than running your program. The user files are the same inside and outside the container. In fact even the processes you see with
ps are the same inside and outside the container (though usernames other than your own won’t show inside the container)