beautypg.com

1 the srun roles and modes, 1 the srun roles, 2 the srun modes – HP XC System 3.x Software User Manual

Page 80: 2 using the srun command with hp-mpi, 3 using the srun command with lsf-hpc, 4 monitoring jobs with the squeue command, 1 the srun roles 9.3.1.2 the srun modes, Example 9-1

background image

Example 9-1 Simple Launch of a Serial Program

$ srun hostname n1

9.3.1 The srun Roles and Modes

The srun command submits jobs to run under SLURM management. The srun command can perform
many roles in launching and managing your job. The srun command operates in several distinct usage
modes to accommodate the roles it performs.

9.3.1.1 The srun Roles

The options of the srun command allow you control a SLURM job by:

Specifying the parallel environment for your job when you submit it, such as the number of nodes
to use, partition, distribution of processes among nodes, and maximum time.

Controlling the behavior of your parallel job as it runs, such as by redirecting or labeling its output,
sending it signals, or specifying its reporting verbosity.

9.3.1.2 The srun Modes

The srun command has five distinct modes in which it can be used:

Simple mode

Batch mode

Allocate mode

Attach mode

Batch (with LSF-HPC) mode

The SLURM Reference Manual describes the Simple, Batch, Allocate, and Attach modes.

You can submit a script to LSF-HPC that contains (simple) srun commands to execute parallel jobs later.
In this case, LSF-HPC takes the place of the srun -b option for indirect, across-machine job-queue
management.

9.3.2 Using the srun Command with HP-MPI

The srun command can be used as an option in an HP-MPI launch command. See Chapter

Chapter 5:

Submitting Jobs

for information about using srun with HP-MPI.

9.3.3 Using the srun Command with LSF-HPC

The srun command can be used in an LSF-HPC launch command. See Chapter

Chapter 10: Using LSF-HPC

for information about using srun with LSF-HPC.

9.4 Monitoring Jobs with the squeue Command

The squeue command displays the queue of running and waiting jobs (or "job steps"), including the JobID
used for scancel), and the nodes assigned to each running job. It has a wide variety of filtering, sorting,
and formatting options. By default, it reports the running jobs in priority order and then the pending jobs
in priority order.

Example 9-2

reports on job 12345 and job 12346:

Example 9-2 Displaying Queued Jobs by Their JobIDs

$ squeue --jobs 12345,12346

JOBID PARTITION NAME USER ST TIME_USED NODES NODELIST(REASON)

12345 debug job1 jody R 0:21 4 n[9-12]

12346 debug job2 jody PD 0:00 8

The squeue command can report on jobs in the job queue according to their state; possible states are:
pending

, running, completing, completed, failed, timeout, and node_fail.

Example 9-3

uses

the squeue command to report on failed jobs.

80

Using SLURM