Slurm

Overview

The Nova cluster uses Slurm to manage jobs on the compute nodes.  It is the brains of the cluster.  Users submit jobs that specify the hardware resources they will need and for how long.   Slurm schedules the order of the jobs based on hardware availability and fair use.

Topics

Overview
Batch vs Interactive Jobs
Job Options for salloc, sbatch, and srun
Nova OnDemand Options
Interactive Sessions with salloc
Batch Jobs with sbatch
        Using #SBATCH lines in a Bash Script
        Batch Script Generator
Viewing the Job Queue - squeue
seff - Job Efficiency
sacctmgr - Account Info

Batch vs. Interactive Jobs

Users request compute resources in two fundamental ways: interactively and in batch jobs.  

  • Interactive Jobs are requested using the salloc command from the head node.  The salloc command waits until the resources are available then opens a shell session on the first compute node assigned to the job.  The user works interactively from the Linux command line.  When the user exits the shell session, the job exits.  Interactive jobs are mainly used for software development and testing.  When doing any prolonged interactive work on the cluster, it is recommended that you use salloc to request a compute node.
  • Batch Jobs  requests are submitted using the sbatch command.   The job actions are typically placed in a bash script file.   Sbatch program reads the bash script and adds it to the queue.  The jobs wait in queue until the resources become available.  They run without any human interaction.  Most cluster computing is done as batch jobs.
  • Nova OnDemand - Nova OnDemand sessions are treated as interactive jobs as well.  Instead of the salloc command, the jobs options are entered from the OnDemand web form.   
  • Parallel Jobs Using srun - If you have a program that needs to execute in parallel, you will often see instructions that say to use a tool like mpirun or orterun to launch to the processes.  The srun command is intended to replace the mpirun and/or orterun because it is better integrated with Slurm.

The options for requesting a job are roughly the same, whether you use sbatch, salloc, srun, or Nova OnDemand,   Let's explore these options in depth.

Job Options for salloc, batch, and srun

The sbatch, salloc, and srun commands all have the same options for specifying the resources being requested.  The options in the table below are the most commonly used and the most important to know.  Nova OnDemand uses the same options, essentially, when you create an OnDemand session.

Command Line OptionDescription and Example Usage
--nodes=<num>
-N <num>
The number of nodes.  Can also be expressed as a range from <minimum> to <maximum> number of nodes:
  --nodes=1 
  --nodes=4-8
--ntasks=<num>
-n <num>

The number of parallel tasks this job will run.  The default is 1 (non-parallel jobs).  For parallelized jobs, provide the number of tasks.  Parallel tasks should also include the --cpus-per-task option to specify how many CPU cores each task will use.

--ntasks=16
or
-n 16

--cpus-per-task=<num>

-c <num>

The number of CPU processors per task.    Slurm defaults to 1 CPU core per task.  However, some applications, such as Abaqus, launch one process that uses multiple CPUs.
--cpus-per-task=1
--gres=<gres>On Nova, this option is mainly used for requesting GPUs for a job.   Some examples for Nova:
--gres=gpu:1             (request 1 GPU)
--gres=gpu:a100:2    (request two A100 GPUs)
Note:  When requesting GPUs,  be aware that you can only request a maximum 6 CPU cores per GPU.

--time=<maxtime>
 

-t <maxtime>

Maximum length of time for the job.  Normally expressed as <hours>:<minutes>:<seconds>

The maximum time should not be significantly larger than the actual expected job run time.   This is because jobs with long run times can be more difficult to schedule.
--time=30           (30 minutes)
--time=8:00:00   (8 hours)
--time=2-00:00:00  (2 days)

--mem=<size>The total memory required for the job.  A value of 0 means that the job can use as much memory as available.  
Units can be appended to the value:   K(kilobytes), M(megabytes),G(gigabytes).   
If unsure how much memory to request, 5G per CPU is a good starting point.  i.e.  16 CPUs = 80G.  
--mem=5000     (5000 megabytes).
--mem=16G       (16 gigabytes)
--mem=0            (no limit)
--mem-per-cpu=<size>Set the maximum amount of memory each CPU core can use.
(The --mem option above may be simple to use).
--partition=<partition>The partition (queue) for the job.  Most Nova jobs use the nova (default) queue.   The full list of partitions on Nova
--partition=nova   (default partition)
--partition=scavenger (low priority jobs that run on any unused systems.  Scavenger jobs can be bumped).  
--partition=instruction  (used only for students in classes).
--partition=reserved   (special reserved machines)
--chdir=<dir>
-D <dir>

Set the working directory for the job.  The path should point to a location under your /work directory or  in /ptmp.

--chdir=/work/ccresearch/jedicker/fluent

--job-name="<jobname>"The name of the job as it appears in the queue.  
--job-name="Sample ABC"
--error=<filename>The  file where error messages will be saved with the job is running.  The string "%j" can be used to indicate the job ID.
--error=job-errors-%j.out
--output=<filename>The file where output messages are saved while the job is running.  The string "%j" can be used to indicate the job ID.
--error=job-output-%j.out
--constraint=<constraint>A constraint forces the job to run on hardware with specific features.  
--constraint=intel    (run only on machines with Intel processors)
--constraint=amd    (run only on machines with AMD processors)
--constraint=nova21   (run only on nova21 nodes)
--account=<acct>The account name the job belong to.
--account=research-staff

Nova OnDemand Job Options

When you log in to Nova OnDemand and request an Interactive App such as Nova Desktop, you will use a form to specify job options the same as you would using salloc or sbatch.   

salloc - Request an interactive session

Any time you doing a lot of interactive work you should use a compute node, not the head node.  The salloc command requests an interactive shell on a compute node.   This is the best way to do development and testing of an application.  (For full graphical interactive sessions, see OnDemand).

Request an interactive session with 1 node, 32 cores, and 128G memory, for 4 hours:
   $ salloc --nodes=1 --ntasks=32 --mem=128G --time=4:00:00

Request an interactive shell session on a system with 1 GPU but only 6 CPU cores:
   $ salloc --nodes=1 --ntasks=6 --tasks-per-cpu=1 --mem=48G --gres=gpu:1 --time=4:00:00

Request an interactive shell with 4 nodes, 32 cores per node, for 8 hours on the reserved partition:

   $ salloc --nodes=4 --ntasks=128 --ntasks-per-node=32 ----time=8:00:00 --partition=reserved

sbatch - Submit Batch Scripts

To submit a batch job, you will need to create a batch script.   A batch script is basically just a text file containing the same commands you would run from the Linux shell.  Keep in mind that the commands must all run unattended (no way to enter passwords, for example).  We use the sbatch command to submit the bash script to Slurm with the Job Options described earlier:

$ sbatch <job-options> <script-file>

So if you have batch script called run-job.sh, you might submit it with Job Options like so:
$ sbatch -N 1 -n 16 -t 4:00:00 run-job.sh

#SBATCH lines

You can also embed the Job Options in bash script file.  Each sbatch option is placed on its own line beginning with the text "#SBATCH".   

Let's start with a simple batch script file that contains the following lines:

#!/bin/bash
# simple.sh - A simple job script with #SBATCH job options.
# The lines beginning with #SBATCH are read by the sbatch program to set the job parameters.
#SBATCH --job-name="Job 1"                   # The name of the job.
#SBATCH --nodes=1                            # The number of nodes being requested.
#SBATCH --ntasks==8                          # 16 tasks (parallel processes).
#SBATCH --ntasks-per-core=1                  # Assigns one CPU core to each task.
#SBATCH --time=10                            # Set the time required for the job to 10 minutes.
#SBATCH --partition=nova                     # The job queue (aka partition) the job should run in.
# Execute some simple Linux commands: 
date          # print the current date/time
hostname      # print the host name
uname -a      # print system information such as the kernel version
w             # print the load and who is logged in.

The first line of the script, #!/bin/bash, indicates that this is a bash script.   You'll notice that lines that begin with #SBATCH contain a command line option for sbatch.  So when you submit the bash script as a job, sbatch reads the #SBATCH lines to determine what the job options are.  

To submit this script to Slurm, copy and paste the lines above into a file called simple.sh using a text editor like vim.

$ vim simple.sh      #  copy and paste the lines above.

$ sbatch simple.sh
Submitted batch job 6077336

As shown, the job is assigned a Job ID of  6077336 and submitted to the job queue.

The Batch Script Generator

We provide a batch script generator that can be very helpful for creating batch scripts to use with sbatch.   Enter the desired job details and the generator will create the lines of the batch script.  Just copy and paste the lines into a file editor such as vim or nano.

squeue - View the Job Queue

The squeue command shows information about jobs in the queue.  

Some examples:

  1. List all jobs currently in the queue:

    $ squeue
    JOBID    PARTITION   NAME       USER      ST       TIME    NODES     NODELIST(REASON)
    6076441  nova        Job-test   jedicker  R     1:52:14        1     nova18-48
  2. Show all jobs belonging to user.name:   
    $ squeue -u user.name
  3. Show all jobs belonging to the account account.name
     $ squeue -A account.name
  4. Show all running jobs from user user.name:  
    $ squeue -t RUNNING -u user.name

srun - Run parallel commands on the cluster.   

Another alternative to running batch jobs.  In this case, you run the parallel command itself (such as mpirun). Usually run from the head node. Good for running single commands from the head node without running them on the head node.

    $ srun -N 1-n 16 -t 2:00:00 my-app

seff - List Job Efficiency

After you run a job and you want to get an idea of how much memory of CPU utilization the job needed, you can do:

$ seff <jobid>

where <jobid> is the Job ID.
 

sacctmgr - View Slurm account information

Slurm uses a database to keep track of Slurm account information. One very import piece of account information is the "qos associations" for each account group.   These are special groups used by Slurm to restrict who has access to different resources on the cluster.  To see what qos assocations have been set for an account, you can use the sacctmgr command like so:
    $ sacctmgr show assoc user=<username> format=account,qos -p
where <username> is the ISU Netid.  This will return the list of accounts and their qos associations.