1 useful commands, 2 job startup and job control, 3 preemption – HP XC System 4.x Software User Manual
Page 91: 6 submitting jobs, Using lsf, Integrated with slurm in the hp xc environment
allocates the appropriate whole node for exclusive use by the
serial job in the same manner as it does for parallel jobs, hence
the name “pseudo-parallel”.
Parallel job
A job that requests more than one slot, regardless of any other
constraints. Parallel jobs are allocated up to the maximum
number of nodes specified by the following specifications:
•
SLURM[nodes=min-max]
(if specified)
•
SLURM[nodelist=node_list]
(if specified)
•
bsub -n
Parallel jobs and serial jobs cannot run on the same node.
Small job
A parallel job that can potentially fit into a single node, and
does not explicitly request more than one node (SLURM[nodes]
or SLURM[node_list] specification). LSF tries to allocate a
single node for a small job.
10.5 Using LSF Integrated with SLURM in the HP XC Environment
This section provides some additional information that should be noted about using LSF in the
HP XC Environment.
10.5.1 Useful Commands
The following describe useful commands for LSF Integrated with SLURM:
•
Use the bjobs -l and bhist -l commands to see the components of the actual SLURM
allocation command.
•
Use the bkill command to kill jobs.
•
Use the bjobs command to monitor job status in LSF integrated with SLURM.
•
Use the bqueues command to list the configured job queues in LSF integrated with SLURM.
10.5.2 Job Startup and Job Control
When LSF starts a SLURM job, it sets SLURM_JOBID to associate the job with the SLURM
allocation. While a job is running, all LSF supported operating-system-enforced resource limits
are supported, including core limit, CPU time limit, data limit, file size limit, memory limit, and
stack limit. If the user kills a job, LSF propagates signals to entire job, including the job file running
on the local node and all tasks running on remote nodes.
10.5.3 Preemption
LSF uses the SLURM "node share" feature to facilitate preemption. When a low-priority is job
preempted, job processes are suspended on allocated nodes, and LSF places the high-priority
job on the same node. After the high-priority job completes, LSF resumes suspended low-priority
jobs.
10.6 Submitting Jobs
The bsub command submits jobs to LSF; it is used to request a set of resources on which to
launch a job. This section focuses on enhancements to this command from the LSF integration
with SLURM on the HP XC system; this section does not discuss standard bsub functionality
or flexibility. See the Platform LSF documentation and the bsub(1) manpage for more information
on this important command. The topic of submitting jobs with the LSF-SLURM External Scheduler
is explored in detail in
“Submitting a Parallel Job Using the SLURM External Scheduler”
.
The HP XC system has several features that make it optimal for running parallel applications,
particularly (but not exclusively)
applications. You can use the bsub command's -n to
10.5 Using LSF Integrated with SLURM in the HP XC Environment
91