1 srun roles, 2 srun modes – HP XC System 2.x Software User Manual
Page 73
6.4.1.1
srun
Roles
srun
options allow you submit a job by:
•
Specifying the parallel environment for your job, such as the number of nodes to use,
partition, distribution of processes among nodes, and maximum time.
•
Controlling the behavior of your parallel job as it runs, such as by redirecting or labeling its
output, sending it signals, or specifying its reporting verbosity.
6.4.1.2
srun
Modes
Because
srun
performs several different roles, it has five distinct ways, or modes, in which
it can be used:
Simple Mode
In simple mode,
srun
submits your job to the local SLURM job
controller, initiates all processes on the specified nodes, and blocks
until needed resources are free to run the job if necessary. Many
control options can change the details of this general pattern.
The simplest way to use the
srun
command is to distribute the
execution of a serial program (such as a LINUX utility) across a
specified number or range of compute nodes. For example:
$ srun -N 8 cp ~/data1 /var/tmp/data1
This command copies (CP) file
data1
from your common home
directory into local disk space on each of eight compute nodes. This
is similar to running simple programs in parallel.
Batch Mode
srun
can also directly submit complex scripts to the job queue(s)
managed by SLURM for later execution when needed resources
become available and when no higher priority jobs are pending.
For example:
$ srun -N 16 -b myscript.sh
This command uses the
srun -b
option to place
myscript.sh
into the batch queue to run later on 16 nodes. Scripts in turn
normally contain either MPI programs, or other simple invocations
of
srun
itself (as shown above). The
srun -b
option supports
basic, local batch service.
Allocate Mode
When you need to combine the job complexity of scripts with
the immediacy of interactive execution, you can use the allocate
mode. For example:
$ srun -A -N 4 myscript.sh
This command uses the
srun -A
option to allocate specified
resources (four nodes in the above example), spawn a subshell with
access to those resources, and then run multiple jobs using simple
srun
commands within the specified script (myscript.sh in the
above example) that the subshell immediately starts to execute. This
is similar to allocating resources by setting environment variables at
the beginning of a script, and then using them for scripted tasks.
No job queues are involved.
Attach
You can monitor or intervene in an already running
srun
job,
either batch (started with
-b
), or interactive (allocated - started
with
-A
), by executing
srun
again and attaching (
-a
) to that
job. For example:
$ srun -a 6543 -j
Using SLURM
6-3