beautypg.com

2 serial applications, 5 run-time environment, 1 slurm – HP XC System 4.x Software User Manual

Page 25: 2 load sharing facility (lsf), 3 standard lsf

background image

1.4.2 Serial Applications

You can build and run

serial application

s under the HP XC development environment. A serial

application is a command or application that does not use any form of parallelism.

Full details and examples of how to build, run, debug, and troubleshoot serial applications are
provided in

“Building Serial Applications”

.

1.5 Run-Time Environment

This section describes LSF, SLURM, and HP-

MPI

, and how these components work together to

provide the HP XC run-time environment. LSF focuses on scheduling (and managing the
workload) and SLURM provides efficient and scalable resource management of the compute
nodes.

Another HP XC environment features

standard LSF

without the interaction with the SLURM

resource manager.

1.5.1 SLURM

Simple Linux Utility for Resource Management (SLURM) is a resource management system that
is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters.
It was developed by Lawrence Livermore National Lab and Linux Networks. As a resource
manager, SLURM allocates exclusive or unrestricted access to resources (application and compute
nodes) for users to perform work, and provides a framework to start, execute and monitor work
(normally a parallel job) on the set of allocated nodes.

A SLURM system consists of two daemons, one configuration file, and a set of commands and
APIs. The central controller daemon, slurmctld, maintains the global state and directs
operations. A slurmd daemon is deployed to each computing node and responds to job-related
requests, such as launching jobs, signalling, and terminating jobs. End users and system software
(such as LSF) communicate with SLURM by means of commands or APIs — for example,
allocating resources, launching parallel jobs on allocated resources, and terminating running
jobs.

SLURM groups compute nodes (the nodes where jobs are run) together into “partitions”. The
HP XC system can have one or several partitions. When HP XC is installed, a single partition of
compute nodes is created by default for LSF batch jobs. The system administrator has the option
of creating additional partitions. For example, another partition could be created for interactive
jobs.

1.5.2 Load Sharing Facility (LSF)

The Load Sharing Facility (LSF) from Platform Computing, Inc. is a batch system resource manager
that has been integrated with SLURM for use on the HP XC system. LSF for SLURM is included
with the HP XC System Software, and is an integral part of the HP XC environment. LSF interacts
with SLURM to obtain and allocate available resources, and to launch and control all the jobs
submitted to LSF. LSF accepts, queues, schedules, dispatches, and controls all the batch jobs that
users submit, according to policies and configurations established by the HP XC site administrator.
On an HP XC system, LSF for SLURM is installed and runs on one HP XC node, known as the

LSF execution host

.

A complete description of LSF is provided in

Chapter 10 “Using LSF”

. In addition, for your

convenience, the HP XC Documentation CD contains LSF manuals from Platform Computing.

1.5.3 Standard LSF

Standard LSF is also available on the HP XC system. The information for using

standard LSF

is

documented in the LSF manuals from Platform Computing. For your convenience, the HP XC
documentation CD contains these manuals.

1.5 Run-Time Environment

25