4 how lsf and slurm interact, 5 hp-mpi, 4 how lsf and slurm interact 1.5.5 hp-mpi – HP XC System 4.x Software User Manual

Page 26

1.5.4 How LSF and SLURM Interact

In the HP XC environment, LSF cooperates with SLURM to combine the powerful scheduling
functionality of LSF with the scalable parallel job launching capabilities of SLURM. LSF acts
primarily as a workload scheduler on top of the SLURM system, providing policy and
topology-based scheduling for end users. SLURM provides an execution and monitoring layer
for LSF. LSF uses SLURM to detect system topology information, make scheduling decisions,
and launch jobs on allocated resources.

When a job is submitted to LSF, LSF schedules the job based on job resource requirements. LSF
communicates with SLURM to allocate the required HP XC compute nodes for the job from the
SLURM lsf partition. LSF provides node-level scheduling for parallel jobs, and core-level
scheduling for serial jobs. Because of node-level scheduling, a parallel job may be allocated more
cores than it requested, depending on its resource request; the srun or mpirun -srun launch
commands within the job still honor the original request. LSF always tries to pack multiple serial
jobs on the same node, with one core per job. Parallel jobs and serial jobs cannot coexist on the
same node.

After the LSF scheduler allocates the SLURM resources for a job, the SLURM allocation information
is recorded with the job. You can view this information with the bjobs and bhist commands.

When LSF starts a job, it sets the SLURM_JOBID and SLURM_NPROCS environment variables in
the job environment. SLURM_JOBID associates the LSF job with SLURM's allocated resources.
The SLURM_NPROCS environment variable is set to the originally requested number of cores.
LSF dispatches the job from the

LSF execution host

, which is the same node on which LSF

daemons run. The LSF JOB_STARTER script, which is configured for all queues, uses the srun
command to launch a user job on the first node in the allocation. Your job can contain additional
srun

or mpirun commands to launch tasks to all nodes in the allocation.

While a job is running, all resource limits supported by LSF enforced, including core limit, CPU
time limit, data limit, file size limit, memory limit and stack limit. When you terminate a job, LSF
uses the SLURM scancel command to propagate the signal to the entire job.

After a job finishes, LSF releases all allocated resources.

A detailed description, along with an example and illustration, of how LSF and SLURM cooperate
to launch and manage jobs is provided in

“How LSF and SLURM Launch and Manage a Job”

It is highly recommended that you review this information.

In summary, and in general:

LSF

Determines WHEN and WHERE the job will run. LSF communicates with SLURM
to determine WHICH resources are available, and SELECTS the appropriate set of
nodes for the job.

SLURM

Allocates nodes for jobs as determined by LSF. It CONTROLS task/rank distribution
within the allocated nodes. SLURM also starts the executables on each host as
requested by the HP-MPI mpirun command.

HP-MPI

Determines HOW the job runs. It is part of the application, so it performs
communication. HP-

MPI

can also pinpoint the processor on which each rank runs.

1.5.5 HP-MPI

HP-

MPI

is a high-performance implementation of the Message Passing Interface (MPI) standard

and is included with the HP XC system. HP-MPI uses SLURM to launch jobs on an HP XC system
— however, it manages the global MPI exchange so that all processes can communicate with
each other.

See the HP-MPI documentation for more information.

Overview of the User Environment