beautypg.com

1 introduction, 2 getting information about queues, 3 getting information about resources – HP XC System 4.x Software User Manual

Page 30

background image

overview about some basic ways of running and managing jobs. Full information and details
about the HP XC job launch environment are provided in

“Using SLURM”

) and the LSF section

of

“Using LSF”

) of this document.

2.2.1 Introduction

As described in

“Run-Time Environment” (page 25)

, SLURM and LSF cooperate to run and

manage jobs on the HP XC system, combining LSF's powerful and flexible scheduling functionality
with SLURM's scalable parallel job-launching capabilities.

SLURM is the low-level resource manager and job launcher, and performs core allocation for
jobs. LSF gathers information about the cluster from SLURM. When a job is ready to be launched,
LSF creates a SLURM node allocation and dispatches the job to that allocation.

Although you can launch jobs directly using SLURM, HP recommends that you use LSF to take
advantage of its scheduling and job management capabilities. You can add SLURM options to
the LSF job launch command line to further define job launch requirements. Use the HP-MPI
mpirun

command and its options within LSF to launch jobs that require MPI's high-performance

message-passing capabilities.

When the HP XC system is installed, a SLURM partition of nodes is created to contain LSF jobs.
This partition is called the lsf partition.

When a job is submitted to LSF, the LSF scheduler prioritizes the job and waits until the required
resources (compute nodes from the lsf partition) are available.

When the requested resources are available for the job, LSF creates a SLURM allocation of nodes
on behalf of the user, sets the SLURM JobID for the allocation, and dispatches the job with the
LSF JOB_STARTER script to the first allocated node.

A detailed explanation of how SLURM and LSF interact to launch and manage jobs is provided
in

“How LSF and SLURM Launch and Manage a Job” (page 92)

.

2.2.2 Getting Information About Queues

The LSF bqueues command lists the configured job queues in LSF. By default, bqueues returns
the following information about all queues:

Queue name

Queue priority

Queue status

Job slot statistics

Job state statistics

To get information about queues, enter the bqueues as follows:

$ bqueues

For more information about using this command and a sample of its output, see

“Examining

System Queues” (page 96)

2.2.3 Getting Information About Resources

The LSF bhosts, lshosts, and lsload commands are quick ways to get information about
system resources. LSF daemons run on only one node in the HP XC system, so the bhosts and
lshosts

commands will list one host — which represents all the resources of the HP XC system.

The total number of cores for that host should be equal to the total number of cores assigned to
the SLURM lsf partition.

The LSF bhosts command provides a summary of the jobs on the system and information
about the current state of LSF.

$ bhosts

30

Using the System