9 determining the lsf execution host, 10 determining available system resources – HP XC System 4.x Software User Manual
Page 94
4.
LSF prepares the user environment for the job on the LSF execution host node and dispatches
the job with the job_starter.sh script. This user environment includes standard LSF
environment variables and two SLURM-specific environment variables: SLURM_JOBID and
SLURM_NPROCS
.
SLURM_JOBID
is the SLURM job ID of the job. Note that this is not the same as the LSF
jobID
.
“Translating SLURM and LSF JOBIDs”
describes the relationship between the
SLURM_JOBID
and the LSF JOBID.
SLURM_NPROCS
is the number of processes allocated.
These environment variables are intended for use by the user's job, whether it is explicitly
(user scripts may use these variables as necessary) or implicitly (the srun commands in the
user’s job use these variables to determine its allocation of resources).
The value for SLURM_NPROCS is 4 and the SLURM_JOBID is 53 in this example.
5.
The user job myscript begins execution on compute node n1.
The first line in myscript is the hostname command. It executes locally and returns the
name of node, n1.
6.
The second line in the myscript script is the srun hostname command. The srun
command in myscript inherits SLURM_JOBID and SLURM_NPROCS from the environment
and executes the hostname command on each compute node in the allocation.
7.
The output of the hostname tasks (n1, n2, n3, and n4). is aggregated back to the srun
launch command (shown as dashed lines in
), and is ultimately returned to the
srun
command in the job starter script, where it is collected by LSF.
The last line in myscript is the mpirun -srun ./hellompi command. The srun command
inside the mpirun command in myscript inherits the SLURM_JOBID and SLURM_NPROCS
environment variables from the environment and executes hellompi on each compute node in
the allocation.
The output of the hellompi tasks is aggregated back to the srun launch command where it is
collected by LSF.
The command executes on the allocated compute nodes n1, n2, n3, and n4.
When the job finishes, LSF cancels the SLURM allocation, which frees the compute nodes for use
by another job.
10.9 Determining the LSF Execution Host
The lsid command displays the name of the HP XC system, the name of the LSF execution host,
and some general LSF information.
$ lsid
Platform LSF HPC version, Update n, build date stamp
Copyright 1992-2008 Platform Computing Corporation
My cluster name is hptclsf
My master name is lsfhost.localdomain
In this example, hptclsf is the LSF cluster name (where is user is logged in and which contains
the compute nodes), and lsfhost.localdomain is the virtual IP name of the node where LSF
is installed and runs (LSF execution host).
10.10 Determining Available System Resources
For the best use of system resources when launching an application, it is useful to know the
system resources that are available for your use. This section describes how to obtain information
about system resources such as the number of cores available, LSF execution host node
information, and LSF system queues.
94
Using LSF