beautypg.com

Examples of lsf job launch – HP XC System 4.x Software User Manual

Page 87

background image

“Translating SLURM and LSF JOBIDs”

describes the relationship between

the SLURM_JOBID and the LSF JOBID.

SLURM_NPROCS

This environment variable passes along the total number of tasks requested
with the bsub -n command to all subsequent srun commands. User scripts
can override this value with the srun -n command, but the new value must
be less than or equal to the original number of requested tasks.

LSF regards the entire HP XC system as a “SLURM machine.” LSF gathers resource information
from SLURM and creates SLURM allocations for each job. As a consequence, every LSF job has
a corresponding SLURM JOBID.

For a parallel job, LSF allocates multiple nodes for the job, but LSF always runs the batch script
(or user command) on the first node. The batch script or the user command must start its tasks
in parallel. The srun command is the SLURM “parallel launcher” command. HP-MPI uses the
srun

command through the mpirun -srun option.

Example 10-1 Examples of LSF Job Launch

The following individual examples are run on a 4-node cluster with 2 cores per nodes:

[lsfadmin@n16 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 4 idle n[13-16]

This command line requests 4 cores, but runs the hostname command on the first node:

[lsfadmin@xc19n16 ~]$ bsub -n4 -I hostname
Job <110> is submitted to default queue .
<>
<>
n13

The following command line requests 4 cores and uses the srun to run the hostname command
on all four:

[lsfadmin@n16 ~]$ bsub -n4 -I srun hostname
Job <111> is submitted to default queue .
<>
<>
n13
n13
n14
n14

The following command line requests 4 cores across all 4 nodes and runs the hostname command
on each node:

[lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" srun hostname
Job <112> is submitted to default queue .
<>
<>
n13
n14
n15
n16

It is possible to set your SSH keys to avoid password prompting so that you can use SSH-based
parallel launchers like the pdsh and mpirun commands. Use the LSB_HOSTS environment
variable to pass the list of allocated nodes to the launcher.

10.2 Overview of LSF Integrated with SLURM

87