Scheduler for GPU Jobs
We have a set of machines intended for jobs that take large memory and GPU jobs. Without Slurm, user wont be able to access any GPUs and there are enforced limits. Our scheduler is coordinated using the Slurm workload manager.
List of Managed Systems
- iLab1.cs.rutgers.edu – iLab4.cs.rutgers.edu
- rLab1.cs.rutgers.edu – rLab6.cs.rutgers.edu
with a total of 75 GPUs.
Why Using Scheduler?
- The scheduler will put your job on a system with free resources. It doesn’t matter which system you log into.
- The scheduler tries to give each user a fair share of the system. It also gives priority to jobs that are shorter or use fewer GPUs.
- If the resources are available, you may run several jobs, up to its individual limit, at the same time.
- With Slurm, the Limitation Enforced on CS Linux Machines for long job and memory do not apply to jobs scheduled via
sbatch
orsrun
. - You can tell the scheduler how many GPUs (up to 4), how much memory you need (default to 80GB and up to about 1TB) and how long your job will last (up to 7 days)
- It enforces larger limits. As of Jan 1, 2024, the default memory will be 40GB.
Machine without Scheduler
- All iLab desktop machine has a single GPU. This GPU is not part of the scheduler.
- If you need a machine with multiple GPUs with first come and first served policy, you can use iLabU.cs.rutgers.edu, a special machine we often use to test latest version of Ubuntu LTS, which has 8 x GeForce GTX 2080ti These 8 GPUs are not managed by Slurm and is subject to Limitation Enforced on CS Linux Machines.
Getting Started
You can use slurm job scheduler in interactive and batch mode. Each one has its own pros and cons.
NOTE: nvidia-smi command won’t show you anything on a system that has slurm running, except if you run it within a slurm job. If you want to know if there is a GPU available for your batch more, you can run srun -G 1 nvidia-smi
. This should give you indication that a single GPU is available. A special copy, nvidia-smi-priv
, can be used outside slurm to see what GPUs current machine has.
Interactive Session (for testing only, please!)
- Interactive session must be run of a command line or a terminal.
- The simplest approach to using it, is to ask for an interactive session. Type
srun -G 4 --pty python3 prog.py
This allocates first available 4 gpus even though there might be other machines with free gpus.
-G
indicates how many GPUs you want. Currently you can get anything from 1 to 4. If GPUs are available, you’ll get them. Note that you may end up on a different computer from where you typed the srun
command, depending upon where there are free GPUs. Important: You must specify -G
option or no GPU is assigned to you.
- If no GPU is free, it will wait for free GPUs and then start. In this case you might prefer to submit a batch job. (See next section.)
- Note that the command will run in a completely different context. It will start by doing a “cd” to the directory you’re currently in, but if other setup is needed, create a script that does the setup and run the script rather than runing the program directly. Of course if your setup is done automatically by .bashrc, that will work.
- If you need to use graphics, use
srun --x11=first -G 4 --pty python3 prog.py
. Of course you’ll need to use srun in a graphical session. Login using RDP or https://weblogin.cs.rutgers.edu.
Batch Mode (recommended)
Batch mode requires you to put your commands in a file and run it as a batch job. Once submitted, your job will start as soon as GPUs are available:
- Put the commands you want to execute in a file, e.g.
myJob
- Submit the job using
sbatch -G 4 myJob
, where the number after-G
is the number of GPUs you want. (See below for large-memory jobs.) - You can see what jobs are running using the command
squeue
- You can cancel a job using
scancel NNN
, where NNN is the job number shown insqueue
. - If there are a lot of jobs in the queue, you might want to test your job to make sure you haven’t made a mistake in the file. You can use
sbatch myJob
, i.e. without-G
. However please cancel the job once you verify that it starts properly. These systems should only be used for jobs that use GPUs
- Put the commands you want to execute in a file, e.g.
What goes in your batch file
The file you submit with sbatch
must contain every command you need to execute your program.
- Remember, it may run on a different computer. It needs all the commands you’d have to type after logging in to get to the point where you can run.
- It must begin with
#!/bin/bash
. We recommend using#!/bin/bash -l
(That’s a lowercase L, not a one.) That will cause it to read your .bash_profile, etc. - At a minimum it needs a
cd
to get to the directory with your files. - If you’re using python in an anaconda environment it needs “activate” for that environment.
- You should probably include
#SBATCH --output=FILE
unless you prefer to type--output
when you submit the job
- It must begin with
- Here’s an example for executing your python code in YOURENV
- Remember, it may run on a different computer. It needs all the commands you’d have to type after logging in to get to the point where you can run.
#!/bin/bash -l #SBATCH --output=logfile cd YOURDIR activate YOURENV python YOURPROGRAM
- Here’s an example for executing a singularity container
#!/bin/bash -l #SBATCH --output=logfile cd YOURDIR #Important: make sure your code autorun upon container execution # and terminate upon code completion singularity run --nv SINGULARITY_CONTAINER.sif
For more details on sbatch scripts
Additional Details
- The maximum number of GPUs we let you allocate in a single job is currently 4. That will go up as we add more computers to the system. If you want to use more than 4, submit several jobs.
- If a job runs longer than a week it will be killed if others want to use the GPUs. (There have been a few cases where someone needs to run a job longer than a week. If that’s essential, let us know and we’ll find a way to avoid killing the job.)
- The scheduler also controls memory. By default, we allocate jobs 80GB of memory. As of Jan 1, 2024, the default memory will be 40GB. However you can specify less. E.g. when using
sbatch
add--mem=32g
. In a few cases that might allow a job to run that otherwise couldn’t. You can specify up to 1TB, but if you do that your job can only run on one of the four systems, and it may have to wait if other jobs are using memory. Please do not specify large amounts of memory unless you absolutely need it, as it will limit what other people can do. - If you look at Slurm documentation, you’ll see lots of examples where all the commands in the file start with
srun
. That’s not necessary or even a good idea here. Usesbatch
whenever possible! - Because we have many kinds of GPUs, to make it easy for you to specify, we have defined the feature. To see a list of all nodes and their specific features, use
sinfo -o "%25N %50f"
which shows:NODELIST
AVAIL_FEATURES
ilab[1-2]
ilab3,rlab1
ilab4,rlab[2,4]
rlab3
rlab5
rlab6
- If you wanted to use RTX A4000 you could specify
-C a4000
or “-C ampere
. If you want to use either a 1080 TI or a TITAN X, you could specify-C '1080ti|titanx'
or-C pascal
. Note that OR is specified by|
. Pascal and Ampere are the architectures. Cards with the same architectures have the same features, but differ in the amount of memory and number of cores. - You can also request a specific node using
-w NODE
, e.g.-w rlab2
. - The only resources we control are GPUs and memory. The scheduler makes no attempt to schedule CPUs. Slurm has the ability to run a single job across multiple computers. We don’t recommend using that. Instead, use multiple jobs. If you wanted to know more details on CPUs, memory, state, weight and features, use
sinfo -Nel
- You will probably want output of your program to go into a file. You can use
-o FILENAME
in thesbatch
command to specify an output file. - In case you need notification both
sbatch
andsrun
have option--mail-type
and--mail-user
which you can utilize to notify you for certain type of events and where to send the email to. - If you login to one of the systems and don’t use
sbatch
orsrun
, you won’t have access to any GPUs. Howevernvidia-smi
will show you all the GPUs, so you can see what’s going on. From within a batch orsrun
job,nvidia-smi
will only show you the GPUs you have allocated. - You can put options in the file. E.g. rather than using
sbatch -G 4 -o logfile
, you could put
#SBATCH -G 4 #SBATCH -o logfile
in the file. All #SBATCH
lines must be at the beginning of the file (right after the #!/bin/bash
).
Jobs Information
- To see list of jobs currently managed by slurm, type
squeue
- To see info about your jobs, type:
scontrol show job jobid
where jobid is obtained fromsqueue
- To see how much memory and other parameters you had used in the past 6 months shown in MaxVMSize, type:
sacct -u $USER -S now-180days -o JobID,User,MaxRSS,MaxVMsize,ReqMem,Submit,Start,State,AllocTRES,Nodelist,Reason
Common Slurm commands
sacct
: show accounting data for all jobs and job stepssacctmgr
: view and modify Slurm account informationsalloc
: set an interactive job allocationsattach
: attach to a running job stepsbatch
: submit a batch script to Slurmscancel
: cancel jobs, job arrays or job stepsscontrol
: view or modify Slurm configuration and state.sdiag
: show scheduling statistics and timing parameterssinfo
: view information about Slurm nodes and partitions.sprio
: show the components of a job’s scheduling prioritysqueue
: show jobs queuessreport
: show reports from job accounting and statisticssrun
: run task(s) across requested resourcessshare
: show the shares and usage for each usersstat
: show the status information of a running job/step.sview
: graphical user interface to view and modify Slurm state
For help with our systems or If you need immediate assistant, visit LCSR Operator at CoRE 235 or call 848-445-2443. Otherwise, see CS HelpDesk. Don’t forget to include your NetID along with descriptions of your problem.