beautypg.com

Examples of lsf-hpc job launch – HP XC System 3.x Software User Manual

Page 97

background image

SLURM_JOBID

This environment variable is created so that subsequent srun commands
make use of the SLURM allocation created by LSF-HPC for the job. This
variable can be used by a job script to query information about the SLURM
allocation, as shown here:

$ squeue --jobs $SLURM_JOBID

“Translating SLURM and LSF-HPC JOBIDs”

describes the relationship

between the SLURM_JOBID and the LSF-HPC JOBID.

SLURM_NPROCS

This environment variable passes along the total number of tasks requested
with the bsub -n command to all subsequent srun commands. User scripts
can override this value with the srun -n command, but the new value must
be less than or equal to the original number of requested tasks.

LSF-HPC regards the entire HP XC system as a “SLURM machine.” LSF-HPC gathers resource
information from SLURM and creates SLURM allocations for each job. As a consequence, every
LSF job has a corresponding SLURM JOBID.

For a parallel job, LSF-HPC allocates multiple nodes for the job, but LSF-HPC always runs the
batch script (or user command) on the first node. The batch script or the user command must
start its tasks in parallel. The srun command is the SLURM “parallel launcher” command.
HP-MPI uses the srun command through the mpirun -srun option.

Example 10-1 Examples of LSF-HPC Job Launch

The following individual examples are run on a 4-node cluster with 2 cores per nodes:

[lsfadmin@n16 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 4 idle n[13-16]

This command line requests 4 cores, but runs the hostname command on the first node:

[lsfadmin@xc19n16 ~]$ bsub -n4 -I hostname
Job <110> is submitted to default queue .
<>
<>
n13

The following command line requests 4 cores and uses the srun to run the hostname command
on all four:

[lsfadmin@n16 ~]$ bsub -n4 -I srun hostname
Job <111> is submitted to default queue .
<>
<>
n13
n13
n14
n14

The following command line requests 4 cores across all 4 nodes and runs the hostname command
on each node:

[lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" srun hostname
Job <112> is submitted to default queue .
<>
<>
n13
n14
n15
n16

It is possible to set your SSH keys to avoid password prompting so that you can use SSH-based
parallel launchers like the pdsh and mpirun commands. Use the LSB_HOSTS environment
variable to pass the list of allocated nodes to the launcher.

10.2 Overview of LSF-HPC Integrated with SLURM

97