6 using slurm, 1 introduction, 2 slurm commands – HP XC System 2.x Software User Manual
Page 71: Table 6-1: slurm commands, Chapter 6, Using slurm

6
Using SLURM
6.1 Introduction
HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource
management and job scheduling. SLURM is a reliable, efficient, open source, fault-tolerant,
job and compute resource manager with features that make it suitable for large-scale, high
performance computing environments. SLURM can report on machine status, perform partition
management, job management, and job scheduling.
The SLURM Reference Manual is available on the HP XC Documentation CD-ROM and from
the following Web site:
http://www.llnl.gov/LCdocs/slurm/
.
As a system resource manager, SLURM has the following key functions:
•
Allocate exclusive and/or non-exclusive access to resources (compute nodes) to users for
some duration of time so they can perform work
•
Provide a framework for starting, executing, and monitoring work (normally a parallel
job) on the set of allocated nodes
•
Arbitrate conflicting requests for resources by managing a queue of pending work
Section 1.4.3 describes the interaction between SLURM and LSF.
6.2 SLURM Commands
Users interact with SLURM through its command line utilities. SLURM has the following basic
commands:
srun
,
scancel
,
squeue
,
sinfo
, and
scontrol
, which can run on any
node in the HP XC system. These commands are summarized in Table 6-1 and described
in the following sections.
Table 6-1: SLURM Commands
Command
Function
srun
Submits jobs to run under SLURM management.
srun
is used to submit a job for
execution, allocate resources, attach to an existing allocation, or initiate job steps.
srun
can:
•
Submit a batch job and then terminate
•
Submit an interactive job and then persist to shepherd the job as it runs
•
Allocate resources to a shell and then spawn that shell for use in running
subordinate jobs
squeue
Displays the queue of running and waiting jobs (or "job steps"), including the JobID
used for
scancel
), and the nodes assigned to each running job. It has a wide variety
of filtering, sorting, and formatting options. By default, it reports the running jobs in
priority order and then the pending jobs in priority order.
scancel
Cancels a pending or running job or job step. It can also be used to send a specified
signal to all processes on all nodes associated with a job. Only job owners or
administrators can cancel jobs.
Using SLURM
6-1