beautypg.com

User – HP XC System 2.x Software User Manual

Page 89

background image

Figure 7-1: How LSF-HPC and SLURM Launch and Manage a Job

N 1 6

N16

User

1

2

4

6

6

6

6

7

7

7

7

5

job_starter.sh

$ srun -nl

myscript

Login node

$ bsub-n4 -ext”SLURM[nodes-4]” -o output.out./myscript

LSF Execution Host

lsfhost.localdomain

SLURM_JOBID=53
SLURM_NPROCS=4

$ hostname

hostname

$ hostname

n1

n1

hostname

hostname

hostname

Compute Node

N2

Compute Node

N3

Compute Node

N4

n2

n3

n4

N16

Compute Node

srun

N1

myscript

$ srun hostname

$ mpirun -srun ./hellompi

3

1.

A user logs in to login node

n16

.

2.

The user executes the following LSF

bsub

command on login node

n16

:

$ bsub -n4 -ext "SLURM[nodes=4]" -o output.out ./myscript

This

bsub

command launches a request for four CPUs (from the

-n4

option of the

bsub

command) across four nodes (from the

-ext "SLURM[nodes=4]"

option); the job is

launched on those CPUs. The script,

myscript

, which is shown here, runs the job:

#!/bin/sh

hostname

srun hostname

mpirun -srun ./hellompi

3.

LSF-HPC schedules the job and monitors the state of the resources (compute nodes) in
the SLURM

lsf

partition. When the LSF-HPC scheduler determines that the required

resources are available, LSF-HPC allocates those resources in SLURM and obtains a
SLURM job identifier (

jobID

) that corresponds to the allocation.

In this example, four processors spread over four nodes (

n1

,

n2

,

n3

,

n4

) are allocated for

myscript

, and the SLURM job id of 53 is assigned to the allocation.

Using LSF

7-5