2 serial applications, 5 run-time environment, 1 slurm – HP XC System 3.x Software User Manual
Page 29: 2 load sharing facility (lsf-hpc), 1 slurm 1.5.2 load sharing facility (lsf-hpc)
1.4.2 Serial Applications
You can build and run
s under the HP XC development environment. A serial
application is a command or application that does not use any form of parallelism.
Full details and examples of how to build, run, debug, and troubleshoot serial applications are
provided in
“Building Serial Applications”
.
1.5 Run-Time Environment
This section describes LSF-HPC, SLURM, and HP-
, and how these components work together
to provide the HP XC run-time environment. LSF-HPC focuses on scheduling (and managing
the workload) and SLURM provides efficient and scalable resource management of the compute
nodes.
Another HP XC environment features
without the interaction with the SLURM
resource manager.
1.5.1 SLURM
Simple Linux Utility for Resource Management (SLURM) is a resource management system that
is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters.
It was developed by Lawrence Livermore National Lab and Linux Networks. As a resource
manager, SLURM allocates exclusive or unrestricted access to resources (application and compute
nodes) for users to perform work, and provides a framework to start, execute and monitor work
(normally a parallel job) on the set of allocated nodes.
A SLURM system consists of two daemons, one configuration file, and a set of commands and
APIs. The central controller daemon, slurmctld, maintains the global state and directs
operations. A slurmd daemon is deployed to each computing node and responds to job-related
requests, such as launching jobs, signalling, and terminating jobs. End users and system software
(such as LSF-HPC) communicate with SLURM by means of commands or APIs — for example,
allocating resources, launching parallel jobs on allocated resources, and terminating running
jobs.
SLURM groups compute nodes (the nodes where jobs are run) together into “partitions”. The
HP XC system can have one or several partitions. When HP XC is installed, a single partition of
compute nodes is created by default for LSF-HPC batch jobs. The system administrator has the
option of creating additional partitions. For example, another partition could be created for
interactive jobs.
1.5.2 Load Sharing Facility (LSF-HPC)
The Load Sharing Facility for High Performance Computing (LSF-HPC) from Platform Computing
Corporation is a batch system resource manager that has been integrated with SLURM for use
on the HP XC system. LSF-HPC for SLURM is included with the HP XC System Software, and
is an integral part of the HP XC environment. LSF-HPC interacts with SLURM to obtain and
allocate available resources, and to launch and control all the jobs submitted to LSF-HPC. LSF-HPC
accepts, queues, schedules, dispatches, and controls all the batch jobs that users submit, according
to policies and configurations established by the HP XC site administrator. On an HP XC system,
LSF-HPC for SLURM is installed and runs on one HP XC node, known as the
.
A complete description of LSF-HPC is provided in
. In addition,
for your convenience, the HP XC Documentation CD contains LSF manuals from Platform
Computing.
1.5 Run-Time Environment
29