beautypg.com

1 useful commands, 2 job startup and job control, 3 preemption – HP XC System 3.x Software User Manual

Page 87: Using lsf-hpc, Integrated with slurm in the hp xc environment

background image

Pseudo-parallel job

A job that requests only one slot but specifies any of these constraints:

mem

tmp

nodes=1

mincpus > 1

Pseudo-parallel jobs are allocated one node for their exclusive use.

NOTE:

Do NOT rely on this feature to provide node-level allocation

for small jobs in job scripts. Use the SLURM[nodes] specification instead,
along with mem, tmp, mincpus allocation options.

LSF-HPC considers this job type as a parallel job because the job requests
explicit node resources. LSF-HPC does not monitor these additional
resources, so it cannot schedule any other jobs to the node without risking
resource contention. Therefore LSF-HPC allocates the appropriate whole
node for exclusive use by the serial job in the same manner as it does for
parallel jobs, hence the name “pseudo-parallel”.

Parallel job

A job that requests more than one slot, regardless of any other constraints.
Parallel jobs are allocated up to the maximum number of nodes specified
by the following specifications:

SLURM[nodes=min-max]

(if specified)

SLURM[nodelist=node_list]

(if specified)

bsub -n

Parallel jobs and serial jobs cannot run on the same node.

Small job

A parallel job that can potentially fit into a single node, and does not
explicitly request more than one node (SLURM[nodes] or
SLURM[node_list] specification). LSF-HPC tries to allocate a single
node for a small job.

10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment

This section provides some additional information that should be noted about using LSF-HPC in the HP
XC Environment.

10.5.1 Useful Commands

The following describe useful commands for LSF-HPC Integrated with SLURM:

Use the bjobs -l and bhist -l commands to see the components of the actual SLURM allocation
command.

Use the bkill command to kill jobs.

Use the bjobs command to monitor job status in LSF-HPC integrated with SLURM.

Use the bqueues command to list the configured job queues in LSF-HPC integrated with SLURM.

10.5.2 Job Startup and Job Control

When LSF-HPC starts a SLURM job, it sets SLURM_JOBID to associate the job with the SLURM allocation.
While a job is running, all LSF-HPC supported operating-system-enforced resource limits are supported,
including core limit, CPU time limit, data limit, file size limit, memory limit, and stack limit. If the user
kills a job, LSF-HPC propagates signals to entire job, including the job file running on the local node and
all tasks running on remote nodes.

10.5.3 Preemption

LSF-HPC uses the SLURM "node share" feature to facilitate preemption. When a low-priority is job
preempted, job processes are suspended on allocated nodes, and LSF-HPC places the high-priority job on
the same node. After the high-priority job completes, LSF-HPC resumes suspended low-priority jobs.

10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment

87