beautypg.com

1 srun roles, 2 srun modes – HP XC System 2.x Software User Manual

Page 73

background image

6.4.1.1

srun

Roles

srun

options allow you submit a job by:

Specifying the parallel environment for your job, such as the number of nodes to use,
partition, distribution of processes among nodes, and maximum time.

Controlling the behavior of your parallel job as it runs, such as by redirecting or labeling its
output, sending it signals, or specifying its reporting verbosity.

6.4.1.2

srun

Modes

Because

srun

performs several different roles, it has five distinct ways, or modes, in which

it can be used:

Simple Mode

In simple mode,

srun

submits your job to the local SLURM job

controller, initiates all processes on the specified nodes, and blocks
until needed resources are free to run the job if necessary. Many
control options can change the details of this general pattern.

The simplest way to use the

srun

command is to distribute the

execution of a serial program (such as a LINUX utility) across a
specified number or range of compute nodes. For example:

$ srun -N 8 cp ~/data1 /var/tmp/data1

This command copies (CP) file

data1

from your common home

directory into local disk space on each of eight compute nodes. This
is similar to running simple programs in parallel.

Batch Mode

srun

can also directly submit complex scripts to the job queue(s)

managed by SLURM for later execution when needed resources
become available and when no higher priority jobs are pending.
For example:

$ srun -N 16 -b myscript.sh

This command uses the

srun -b

option to place

myscript.sh

into the batch queue to run later on 16 nodes. Scripts in turn
normally contain either MPI programs, or other simple invocations
of

srun

itself (as shown above). The

srun -b

option supports

basic, local batch service.

Allocate Mode

When you need to combine the job complexity of scripts with
the immediacy of interactive execution, you can use the allocate
mode. For example:

$ srun -A -N 4 myscript.sh

This command uses the

srun -A

option to allocate specified

resources (four nodes in the above example), spawn a subshell with
access to those resources, and then run multiple jobs using simple

srun

commands within the specified script (myscript.sh in the

above example) that the subshell immediately starts to execute. This
is similar to allocating resources by setting environment variables at
the beginning of a script, and then using them for scripted tasks.
No job queues are involved.

Attach

You can monitor or intervene in an already running

srun

job,

either batch (started with

-b

), or interactive (allocated - started

with

-A

), by executing

srun

again and attaching (

-a

) to that

job. For example:

$ srun -a 6543 -j

Using SLURM

6-3