Most research computing in CS department, and much of our instruction, uses GPUs with Nvidia’s Cuda software, and applications such as Pytorch and Tensorflow. This page describes these technologies
GPUs and their allocation
Most of our research and larger instructional systems have 8 Cuda-capable Nvidia GPUs. Desktop systems generally have one smaller GPU that is still Cuda-capable, and thus could be used for courses in GPU programming or preliminary software development.
GPUs can be shared by more than one user. However the memory is limited (typically 12 GB on public systems), so in practice only one or two can use them at a time. To avoid having one user dominate, on the ilab systems software assigns 4 GPUs to you when you login. For public systems (primarily ilab1 – ilab4), please do not use more than one system unless it’s evident the no one else is using them.
The Cuda Toolkit is a set of APIs from Nvidia, designed to make it easy to write programs using GPUs. Among other things, it provides a uniform interface that applies to many different models of GPU. There are alternatives, e.g. OpenCL, but Cuda is used most commonly here. Cuda has bindings for Python and most other major programming languages. Work in our department uses primarily Python.
We install the latest version of Cuda on all systems with appropriate GPUs. Many users have existing code that requires older versions. You can ask email@example.com to install previous versions on your system. However this may not be possible. E.g. Ubuntu 20 supports Cuda 11, but no older versions. See the section on containers, below, for a way to use older versions of software.
In this department, GPU-based work is done primarily in Python. We would be happy to support users with other languages, but most tools currently installed are for Python. The most common tool is Pytorch, but we also have some usage of Tensorflow.
Pytorch, Tensorflow, and other tools are installed in anaconda environments. DO NOT simply type “python.” On many systems that will give you an older version of Python, without access to the GPU-related tools. Instead use the most recent version of anaconda that you can. Anaconda is a packaged environment for Python. It has most of the major tools. When possible, we can add additional ones on request to firstname.lastname@example.org.
The Anaconda environments are located in /koko/system/anaconda/envs/. Currently we have python36 through python38. But we’ll be adding new versions as they become available, type
conda env list to see the current list.
To pick your version, see Using Python on CS Linux Machine. For most purposes, it’s sufficient to add the appropriate environment to your path, e.g.
Adding your own software
We have tried to put all the commonly-used software in our Anaconda environments. If you need more, there are two options:
- Install individual packages using
pip install --userThat causes them to be installed in your home directory, in
~/.local/lib/pythonM.NThis is a reasonable approach if you have a few packages you need.
- Install you own python enviornment, either using a venv. or your own Anaconda distribution. Because Anaconda distributions are large, and our home directories have limited quotas, you may want to put an Anaconda distribution in either
/common/home/NETID(if you’re a grad student or faculty).
- Install individual packages using
As noted above, you may need to use a version of Conda, pytorch, or other software that is different from what we have installed. For this we recommend using a container. A container is in some respects like a virtual machine. It has its own set of software. But it’s not as isolated from the underlying operating systems. It has the same users, processes, and user file systems. It is really a way of delivering a specific set of software that is different from what is installed on the main system.
Nvidia supplies official containers that have Cuda, Pytorch, Tensorflow, and many other tools. They issue new containers once a month, but keep the old ones archived. That lets you get most reasonable combinations of versions by running the right container.
Because older versions of Cuda won’t install on Ubuntu 20, if you need Cuda 9 or 10, you’ll have to use a container on Ubuntu 20. However in the long run we’ll probably use containers even for current software.
We have downloaded the Nvidia containers that we think you’d be most likely want, to
/koko/system/nvidia-containers. In that directory there are also INDEX files listing all available containers and the versions of major software they support. If you need a container that we haven’t provided, we can easily download it. To look at them, do
In the table at the end, you’ll see an entries like
21.05 1.15.5 or 2.4.0, Ubuntu 20, Cuda 11.3.0, Python 3.8
21.04 1.15.5 or 2.4.0, Ubuntu 20, Cuda 11.3.0, Python 3.8
21.05 is the container version (2021, May). It uses version 1.15.5 or 2.4.0 of tensorflow, with Ubuntu 20, Cuda 11.3.0 and Python 3.8. The versions at the left margin are the ones we have. The indented versions are available from Nvidia and could be downloaded if you need them.
If you do
ls /koko/system/nvidia-containers you’ll see a list of the files we have. The containers all end in .sif. The names should match the entries in the index file. e.g. tensorflow:21.05-tf2-py3.sif is 21.05, the version with Tensorflow 2.4.0. (1.15.5 would be tf1).
To use a container, simply run it with Singularity, e.g.
singularity run --nv /koko/system/nvidia-containers/tensorflow:21.05-tf2-py3.sif
Once it starts, you’ll be in a bash shell within the container, in your normal home directory. At that point you can do development and run programs as you normally would.
You can install additional python software for the container as described above, i.e. using
pip install --user. Because your home directory is the same inside the container and outside, it works just as it would outside the container. Of course you can also install your own python environment. That will work inside the container as well, though you’ll have to make sure you have versions of software that match the Cuda version supported by the container.
These containers have software intended to run code. They may not have everything you want for development. (In particular, there’s no emacs text editor.) Thus you may want to maintain a separate window on the main machine to do things other than running your program. The user files are the same inside and outside the container. In fact even the processes you see with
ps are the same inside and outside the container (though usernames other than your own won’t show inside the container) .