3 notes on lsf-hpc – HP XC System 2.x Software User Manual
Page 87
To illustrate how the external scheduler is used to launch an application, consider the following
command line, which launches an application on ten nodes with one task per node:
$ bsub -n 10 -ext "SLURM[nodes=10]" srun my_app
The following command line launches the same application, also on ten nodes, but stipulates
that node
n16
should not be used:
$ bsub -n 10 -ext "SLURM[nodes=10;exclude=n16]" srun my_app
7.1.3 Notes on LSF-HPC
The following are noteworthy items for users of LSF-HPC on HP XC systems:
•
You must run jobs as a non-root user such as
lsfadmin
or any other local user; do not
run jobs as the root user.
•
A SLURM partition named
lsf
is used to manage LSF jobs. You can view information
about this partition with the
sinfo
command.
•
LSF daemons only run on one node in the HP XC system. As a result, the
lshosts
and
bhosts
commands only list one host that represents all the resources of the HP XC
system. The total number of CPUs for that host should be equal to the total number of CPUs
found in the nodes assigned to the SLURM
lsf
partition.
The total number of processors for that host should be equal to the total number of
processors assigned to the SLURM
lsf
partition.
•
When a job is submitted and the resources are available, LSF-HPC creates a properly sized
SLURM allocation and adds several standard LSF environment variables to the environment
in which the job is to be run. The following two environment variables are also added:
SLURM_JOBID
This environment variable is created so that subsequent
srun
commands make use of the SLURM allocation created by
LSF-HPC for the job. This variable can be used by a job script to
query information about the SLURM allocation, as shown here:
$ squeue --jobs $SLURM_JOBID
SLURM_NPROCS
This environment variable passes along the total number of
tasks requested with the
bsub -n
command to all subsequent
srun
commands. User scripts can override this value with the
srun -n
command, but the new value must be less than or
equal to the original number of requested tasks.
•
LSF-HPC dispatches all jobs locally. The default installation of LSF-HPC for SLURM
on the HP XC system provides a job starter script that is configured for use by all
LSF-HPC queues. This job starter script adjusts the
LSB_HOSTS
and
LSB_MCPU_HOSTS
environment variables to the correct resource values in the allocation. Then, the job starter
script uses the
srun
command to launch the user task on the first node in the allocation.
If this job starter script is not configured for a queue, the user jobs begin execution locally
on the LSF-HPC execution host. In this case, it is recommended that the user job uses one
or more
srun
commands to make use of the resources allocated to the job. Work done
on the LSF-HPC execution host competes for CPU time with the LSF-HPC daemons, and
could affect the overall performance of LSF-HPC on the HP XC system.
The
bqueues -l
command displays the full queue configuration, including whether or
not a job starter script has been configured. See the Platform LSF documentation or the
bqueues
(1)
manpage for more information on the use of this command.
For example, consider an LSF-HPC LSF configuration in which node
n20
is the LSF-HPC
execution host and nodes
n[1-10]
are in the SLURM
lsf
partition. The default
normal
Using LSF
7-3