beautypg.com

9 using slurm, 1 introduction to slurm, 2 slurm utilities – HP XC System 3.x Software User Manual

Page 79: 3 launching jobs with the srun command, Chapter 9 “using slurm, Using slurm

background image

9 Using SLURM

HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource management
and job scheduling.

This chapter addresses the following topics:

“Introduction to SLURM” (page 79)

“SLURM Utilities” (page 79)

“Launching Jobs with the srun Command” (page 79)

“Monitoring Jobs with the squeue Command” (page 80)

“Terminating Jobs with the scancel Command” (page 81)

“Getting System Information with the sinfo Command” (page 81)

“Job Accounting” (page 81)

“Fault Tolerance” (page 82)

“Security” (page 82)

9.1 Introduction to SLURM

SLURM is a reliable, efficient, open source, fault-tolerant, job and compute resource manager with features
that make it suitable for large-scale, high performance computing environments. SLURM can report on
machine status, perform partition management, job management, and job scheduling.

The SLURM Reference Manual is available on the HP XC Documentation CD-ROM and from the following
Web site:

http://www.llnl.gov/LCdocs/slurm/

.

SLURM manpages are also available online on the HP XC system.

As a system resource manager, SLURM has the following key functions:

Allocate exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration
of time so they can perform work

Provide a framework for starting, executing, and monitoring work (normally a parallel job) on the
set of allocated nodes

Arbitrate conflicting requests for resources by managing a queue of pending work

“How LSF-HPC and SLURM Interact”

describes the interaction between SLURM and LSF-HPC.

9.2 SLURM Utilities

You interact with SLURM through its command line utilities. The basic utilities are listed here:

srun

squeue

scancel

sinfo

scontrol

For more information on any of these utilities, see the SLURM Reference Manual or the corresponding
manpage.

9.3 Launching Jobs with the srun Command

The srun command submits and controls jobs that run under SLURM management. The srun command
is used to submit interactive and batch jobs for execution, allocate resources, and initiate job steps.

The srun command handles both serial and parallel jobs.

The srun command has a significant number of options to control the execution of your application closely.
However, you can use it for a simple launch of a serial program, as

Example 9-1

shows.

9.1 Introduction to SLURM

79