2 serial applications, 5 run-time environment, 1 slurm – HP XC System 4.x Software User Manual
Page 25: 2 load sharing facility (lsf), 3 standard lsf
1.4.2 Serial Applications
You can build and run
s under the HP XC development environment. A serial
application is a command or application that does not use any form of parallelism.
Full details and examples of how to build, run, debug, and troubleshoot serial applications are
provided in
“Building Serial Applications”
.
1.5 Run-Time Environment
This section describes LSF, SLURM, and HP-
, and how these components work together to
provide the HP XC run-time environment. LSF focuses on scheduling (and managing the
workload) and SLURM provides efficient and scalable resource management of the compute
nodes.
Another HP XC environment features
without the interaction with the SLURM
resource manager.
1.5.1 SLURM
Simple Linux Utility for Resource Management (SLURM) is a resource management system that
is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters.
It was developed by Lawrence Livermore National Lab and Linux Networks. As a resource
manager, SLURM allocates exclusive or unrestricted access to resources (application and compute
nodes) for users to perform work, and provides a framework to start, execute and monitor work
(normally a parallel job) on the set of allocated nodes.
A SLURM system consists of two daemons, one configuration file, and a set of commands and
APIs. The central controller daemon, slurmctld, maintains the global state and directs
operations. A slurmd daemon is deployed to each computing node and responds to job-related
requests, such as launching jobs, signalling, and terminating jobs. End users and system software
(such as LSF) communicate with SLURM by means of commands or APIs — for example,
allocating resources, launching parallel jobs on allocated resources, and terminating running
jobs.
SLURM groups compute nodes (the nodes where jobs are run) together into “partitions”. The
HP XC system can have one or several partitions. When HP XC is installed, a single partition of
compute nodes is created by default for LSF batch jobs. The system administrator has the option
of creating additional partitions. For example, another partition could be created for interactive
jobs.
1.5.2 Load Sharing Facility (LSF)
The Load Sharing Facility (LSF) from Platform Computing, Inc. is a batch system resource manager
that has been integrated with SLURM for use on the HP XC system. LSF for SLURM is included
with the HP XC System Software, and is an integral part of the HP XC environment. LSF interacts
with SLURM to obtain and allocate available resources, and to launch and control all the jobs
submitted to LSF. LSF accepts, queues, schedules, dispatches, and controls all the batch jobs that
users submit, according to policies and configurations established by the HP XC site administrator.
On an HP XC system, LSF for SLURM is installed and runs on one HP XC node, known as the
A complete description of LSF is provided in
. In addition, for your
convenience, the HP XC Documentation CD contains LSF manuals from Platform Computing.
1.5.3 Standard LSF
Standard LSF is also available on the HP XC system. The information for using
is
documented in the LSF manuals from Platform Computing. For your convenience, the HP XC
documentation CD contains these manuals.
1.5 Run-Time Environment
25