beautypg.com

7 lsf-slurm external scheduler, 8 how lsf and slurm launch and manage a job, Lsf with slurm job launch exit codes – HP XC System 4.x Software User Manual

Page 92: How lsf and slurm launch and manage a job, Lsf-slurm external scheduler

background image

request more than one core for a job. This option, coupled with the external SLURM scheduler,
discussed in

“LSF-SLURM External Scheduler”

, gives you much flexibility in selecting resources

and shaping how the job is executed on those resources.

LSF reserves the requested number of nodes and executes one instance of the job on the first
reserved node, when you request multiple nodes. Use the srun command or the mpirun
command with the -srun option in your jobs to launch parallel applications. The -srun can be
set implicitly for the mpirun command; see

“Submitting a Parallel Job That Uses the HP-MPI

Message Passing Interface”

for more information on using the mpirun -srun command.

Most parallel applications rely on rsh or ssh to "launch" remote tasks. The ssh utility is installed
on the HP XC system by default. If you configured the

ssh

keys to allow unprompted access to

other nodes in the HP XC system, the parallel applications can use ssh. See

“Enabling Remote

Execution with OpenSSH”

for more information on ssh.

The following table shows exit codes for jobs launched under LSF integrated with SLURM:

Table 10-1 LSF with SLURM Job Launch Exit Codes

Description

Exit Code

Success

0

There was a job launch error in SLURM

124

There was a job launch error in HPC-LSF

125

10.7 LSF-SLURM External Scheduler

The external scheduler option is an important option that can be included when submitting
parallel jobs with LSF integrated with SLURM. This option

Provides application-specific external scheduling options for jobs capabilities

Lets you include several SLURM options in the LSF command line.

For example, you can submit a job to run one task per node when you have a resource-intensive
job that needs to have sole access to the node's full resources. If your job needs particular resources
found only on a specific set of nodes, you can use this option to submit a job to those nodes.

The LSF host options enable you to identify an HP XC system "host" within a larger LSF cluster.
After the HP XC system is selected, LSF's external SLURM scheduler provides the additional
flexibility to request specific resources within the HP XC system

You can use the LSF external scheduler functionality within the bsub command and in LSF
queue configurations. See the LSF bqueues(1) command for more information on determining
how the available queues are configured on HP XC systems.

See

“Submitting a Parallel Job Using the SLURM External Scheduler”

for information and

examples on submitting jobs with the LSF-SLURM External Scheduler.

10.8 How LSF and SLURM Launch and Manage a Job

This section describes what happens in the HP XC system when a job is submitted to LSF.

Figure 10-1

illustrates this process. Use the numbered steps in the text and depicted in the

illustration as an aid to understanding the process.

Consider the HP XC system configuration shown in

Figure 10-1

, in which

lsfhost.localdomain

is the virtual IP name assigned to the

LSF execution host

, node n16

is the login node, and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain
two cores, providing 20 cores for use by LSF jobs.

92

Using LSF