HII-HPC Cluster

Slurm

The HII-HPC utilizes the Slurm Workload Manager to submit, monitor, and manage computational jobs running on its compute nodes.

Slurm is an open-source job scheduler implemented by many of the world’s HPC Clusters.

Commands

The following are the most common commands used when interacting with Slurm:

sinfo (docs) - Reports the state of partitions and nodes managed by Slurm.
sbatch (docs) - Submits a job script for execution typically containing srun commands to launch parallel tasks.
srun (docs) - Used to submit a job for execution or initiate job steps in real time.
squeue (docs) - Reports the state of jobs or job steps.
scancel (docs) - Used to stop jobs before running/completing.
sacct (docs) - Show running as well as recently completed or failed jobs.

On hii.rc.usf.edu and hii2.rc.usf.edu, you can view documentation using the man command, e.g. man sbatch.

Partitions

Compute nodes in the cluster are grouped into Slurm Partitions which provide the following classes of service:

hii02 - Partition for production batch jobs (partition contains the majority of the compute nodes).
hii-test - Development partition for testing batch jobs.
hii-interactive - Nodes in partition reserved for Interactive Shell real-time feedback.

Use the option --partition=<partition> or -p <partition> for slurm commands such as sinfo, srun, sbatch, squeue to indicate the partition to use.

Note: You will have access to the same GPFS Filesystems regardless of the partition you choose.

HII Walkthroughs

Other Resources

Slurm Overview - Main page for official SchedMD Slurm Documentation.
Slurm Quick Reference - Quickref of major Slurm commands.
Slurm Rosetta Stone - Translation for individuals familiar with similar HPC scheduling systems.
Stack Exchange