1 introduction, 2 getting information about queues, 3 getting information about resources – HP XC System 3.x Software User Manual
Page 34
overview about some basic ways of running and managing jobs. Full information and details
about the HP XC job launch environment are provided in
) and the LSF-HPC
section of
) of this document.
2.2.1 Introduction
As described in
“Run-Time Environment” (page 29)
, SLURM and LSF-HPC cooperate to run
and manage jobs on the HP XC system, combining LSF-HPC's powerful and flexible scheduling
functionality with SLURM's scalable parallel job-launching capabilities.
SLURM is the low-level resource manager and job launcher, and performs core allocation for
jobs. LSF-HPC gathers information about the cluster from SLURM. When a job is ready to be
launched, LSF-HPC creates a SLURM node allocation and dispatches the job to that allocation.
Although you can launch jobs directly using SLURM, HP recommends that you use LSF-HPC
to take advantage of its scheduling and job management capabilities. You can add SLURM
options to the LSF-HPC job launch command line to further define job launch requirements. Use
the HP-MPI mpirun command and its options within LSF-HPC to launch jobs that require MPI's
high-performance message-passing capabilities.
When the HP XC system is installed, a SLURM partition of nodes is created to contain LSF-HPC
jobs. This partition is called the lsf partition.
When a job is submitted to LSF-HPC, the LSF-HPC scheduler prioritizes the job and waits until
the required resources (compute nodes from the lsf partition) are available.
When the requested resources are available for the job, LSF-HPC creates a SLURM allocation of
nodes on behalf of the user, sets the SLURM JobID for the allocation, and dispatches the job
with the LSF-HPC JOB_STARTER script to the first allocated node.
A detailed explanation of how SLURM and LSF-HPC interact to launch and manage jobs is
provided in
“How LSF-HPC and SLURM Launch and Manage a Job” (page 102)
.
2.2.2 Getting Information About Queues
The LSF bqueues command lists the configured job queues in LSF-HPC. By default, bqueues
returns the following information about all queues:
•
Queue name
•
Queue priority
•
Queue status
•
Job slot statistics
•
Job state statistics
To get information about queues, enter the bqueues as follows:
$ bqueues
For more information about using this command and a sample of its output, see
2.2.3 Getting Information About Resources
The LSF bhosts, lshosts, and lsload commands are quick ways to get information about
system resources. LSF-HPC daemons run on only one node in the HP XC system, so the bhosts
and lshosts commands will list one host — which represents all the resources of the HP XC
system. The total number of cores for that host should be equal to the total number of cores
assigned to the SLURM lsf partition.
•
The LSF bhosts command provides a summary of the jobs on the system and information
about the current state of LSF-HPC.
$ bhosts
34
Using the System