beautypg.com

9 determining the lsf execution host, 10 determining available system resources – HP XC System 3.x Software User Manual

Page 104

background image

4.

LSF-HPC prepares the user environment for the job on the LSF execution host node and
dispatches the job with the job_starter.sh script. This user environment includes
standard LSF environment variables and two SLURM-specific environment variables:
SLURM_JOBID

and SLURM_NPROCS.

SLURM_JOBID

is the SLURM job ID of the job. Note that this is not the same as the LSF-HPC

jobID

.

“Translating SLURM and LSF-HPC JOBIDs”

describes the relationship between the

SLURM_JOBID

and the LSF-HPC JOBID.

SLURM_NPROCS

is the number of processes allocated.

These environment variables are intended for use by the user's job, whether it is explicitly
(user scripts may use these variables as necessary) or implicitly (the srun commands in the
user’s job use these variables to determine its allocation of resources).

The value for SLURM_NPROCS is 4 and the SLURM_JOBID is 53 in this example.

5.

The user job myscript begins execution on compute node n1.

The first line in myscript is the hostname command. It executes locally and returns the
name of node, n1.

6.

The second line in the myscript script is the srun hostname command. The srun
command in myscript inherits SLURM_JOBID and SLURM_NPROCS from the environment
and executes the hostname command on each compute node in the allocation.

7.

The output of the hostname tasks (n1, n2, n3, and n4). is aggregated back to the srun
launch command (shown as dashed lines in

Figure 10-1

), and is ultimately returned to the

srun

command in the job starter script, where it is collected by LSF-HPC.

The last line in myscript is the mpirun -srun ./hellompi command. The srun command
inside the mpirun command in myscript inherits the SLURM_JOBID and SLURM_NPROCS
environment variables from the environment and executes hellompi on each compute node in
the allocation.

The output of the hellompi tasks is aggregated back to the srun launch command where it is
collected by LSF-HPC.

The command executes on the allocated compute nodes n1, n2, n3, and n4.

When the job finishes, LSF-HPC cancels the SLURM allocation, which frees the compute nodes
for use by another job.

10.9 Determining the LSF Execution Host

The lsid command displays the name of the HP XC system, the name of the LSF execution host,
and some general LSF-HPC information.

$ lsid
Platform LSF HPC version number for SLURM, date and time stamp
Copyright 1992-2005 Platform Computing Corporation

My cluster name is hptclsf
My master name is lsfhost.localdomain

In this example, hptclsf is the LSF cluster name (where is user is logged in and which contains
the compute nodes), and lsfhost.localdomain is the virtual IP name of the node where
LSF-HPC is installed and runs (LSF execution host).

10.10 Determining Available System Resources

For the best use of system resources when launching an application, it is useful to know the
system resources that are available for your use. This section describes how to obtain information
about system resources such as the number of cores available, LSF execution host node
information, and LSF-HPC system queues.

104

Using LSF-HPC