Example 7-6: submitting an hp-mpi job, 6 submitting a batch job or job script, Section 7.4.6 – HP XC System 2.x Software User Manual
Page 98
The
srun
command, used by the
mpirun
command to launch the MPI tasks in parallel,
determines the number of tasks to launch from the
SLURM_NPROCS
environment variable that
was set by LSF-HPC. Recall that the value of this environment variable is equivalent to the
number provided by the
-n
option of the
bsub
command.
Consider an HP XC system configuration in which
lsfhost.localdomain
is the LSF
execution host and nodes
n[1-10]
are compute nodes in the
lsf
partition. All nodes contain
2 processors, providing 20 processors for use by LSF jobs.
Example 7-6 runs a
hello_world
MPI program on four processors.
Example 7-6: Submitting an HP-MPI Job
$ bsub -n4 -I mpirun -srun ./hello_world
Job <75> is submitted to default queue
<
<
Hello world! I’m 0 of 4 on n2
Hello world! I’m 1 of 4 on n2
Hello world! I’m 2 of 4 on n4
Hello world! I’m 3 of 4 on n4
Example 7-7 runs the same
hello_world
MPI program on four processors, but uses the
external SLURM scheduler to request one task per node.
Example 7-7: Submitting an HP-MPI Job with a Specific Topology Request
$ bsub -n4 -ext "SLURM[nodes=4]" -I mpirun -srun ./hello_world
Job <77> is submitted to default queue
<
<
Hello world! I’m 0 of 4 on n1
Hello world! I’m 1 of 4 on n2
Hello world! I’m 2 of 4 on n3
Hello world! I’m 3 of 4 on n4
If the MPI job requires the use of an
appfile
, or has another reason that prohibits the use of
the
srun
command as the task launcher, some preprocessing to determine the node hostnames
to which
mpirun
’s standard task launcher should launch the tasks needs to be done. In such
scenarios, you need to write a batch script; there are several methods available for determining
the nodes in an allocation. One is using the
SLURM_JOBID
environment variable with the
squeue
command to query the nodes. Another is using LSF environment variables such as
LSB_HOSTS
and
LSB_MCPU_HOSTS
, which are prepared by the HP XC job starter script.
7.4.6 Submitting a Batch Job or Job Script
The
bsub
command format to submit a batch job or job script is:
bsub -n num-procs [bsub-options] script-name
The
-n
num-procs parameter specifies the number of processors the job requests.
-n
num-procs
is required for parallel jobs.
script-name
is the name of the batch job or script. Any
bsub
options can be included. The script can contain one or more
srun
or
mpirun
commands
and options.
The script will be executed once on the first allocated node, and any
srun
or
mpirun
commands within the script can use some or all of the allocated compute nodes.
7-14
Using LSF