HII-HPC Cluster
HII-HPC Cluster
Slurm
The HII-HPC utilizes the Slurm Workload Manager to submit, monitor, and manage computational jobs running on its compute nodes.
Slurm is an open-source job scheduler implemented by many of the world’s HPC Clusters.
Commands
The following are the most common commands used when interacting with Slurm:
sinfo
(docs) - Reports the state of partitions and nodes managed by Slurm.sbatch
(docs) - Submits a job script for execution typically containing srun commands to launch parallel tasks.srun
(docs) - Used to submit a job for execution or initiate job steps in real time.squeue
(docs) - Reports the state of jobs or job steps.scancel
(docs) - Used to stop jobs before running/completing.sacct
(docs) - Show running as well as recently completed or failed jobs.
On hii.rc.usf.edu
and hii2.rc.usf.edu
, you can view documentation using the man
command, e.g. man sbatch
.
Partitions
Compute nodes in the cluster are grouped into Slurm Partitions which provide the following classes of service:
hii02
- Partition for production batch jobs (partition contains the majority of the compute nodes).hii-test
- Development partition for testing batch jobs.hii-interactive
- Nodes in partition reserved for Interactive Shell real-time feedback.
Use the option --partition=<partition>
or -p <partition>
for slurm commands such as sinfo
, srun
, sbatch
,
squeue
to indicate the partition to use.
Note: You will have access to the same GPFS Filesystems regardless of the partition you choose.
HII Walkthroughs
Other Resources
- Slurm Overview - Main page for official SchedMD Slurm Documentation.
- Slurm Quick Reference - Quickref of major Slurm commands.
- Slurm Rosetta Stone - Translation for individuals familiar with similar HPC scheduling systems.
- Stack Exchange