Examples of lsf job launch – HP XC System 4.x Software User Manual
Page 87

“Translating SLURM and LSF JOBIDs”
describes the relationship between
the SLURM_JOBID and the LSF JOBID.
SLURM_NPROCS
This environment variable passes along the total number of tasks requested
with the bsub -n command to all subsequent srun commands. User scripts
can override this value with the srun -n command, but the new value must
be less than or equal to the original number of requested tasks.
LSF regards the entire HP XC system as a “SLURM machine.” LSF gathers resource information
from SLURM and creates SLURM allocations for each job. As a consequence, every LSF job has
a corresponding SLURM JOBID.
For a parallel job, LSF allocates multiple nodes for the job, but LSF always runs the batch script
(or user command) on the first node. The batch script or the user command must start its tasks
in parallel. The srun command is the SLURM “parallel launcher” command. HP-MPI uses the
srun
command through the mpirun -srun option.
Example 10-1 Examples of LSF Job Launch
The following individual examples are run on a 4-node cluster with 2 cores per nodes:
[lsfadmin@n16 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 4 idle n[13-16]
This command line requests 4 cores, but runs the hostname command on the first node:
[lsfadmin@xc19n16 ~]$ bsub -n4 -I hostname
Job <110> is submitted to default queue
<
<
n13
The following command line requests 4 cores and uses the srun to run the hostname command
on all four:
[lsfadmin@n16 ~]$ bsub -n4 -I srun hostname
Job <111> is submitted to default queue
<
<
n13
n13
n14
n14
The following command line requests 4 cores across all 4 nodes and runs the hostname command
on each node:
[lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" srun hostname
Job <112> is submitted to default queue
<
<
n13
n14
n15
n16
It is possible to set your SSH keys to avoid password prompting so that you can use SSH-based
parallel launchers like the pdsh and mpirun commands. Use the LSB_HOSTS environment
variable to pass the list of allocated nodes to the launcher.
10.2 Overview of LSF Integrated with SLURM
87