beautypg.com

Example 6-3: reporting on failed jobs in the queue, 6 killing jobs with the scancel command, Example 6-4: killing a job by its jobid – HP XC System 2.x Software User Manual

Page 83: Example 6-5: cancelling all pending jobs, Example 6-6: sending a signal to a job, Example 6-7: using the sinfo command (no options)

background image

The

squeue

command can report on jobs in the job queue according to their state; valid states

are: pending, running, completing, completed, failed, timeout, and node_fail. Example 6-3 uses
the

squeue

command to report on failed jobs.

Example 6-3: Reporting on Failed Jobs in the Queue

$ squeue --state=FAILED

JOBID PARTITION

NAME

USER

ST

TIME

NODES NODELIST

59

amt1 hostname

root

F

0:00

0

6.6 Killing Jobs with the

scancel

Command

The

scancel

command cancels a pending or running job or job step. It can also be used to

send a specified signal to all processes on all nodes associated with a job. Only job owners
or administrators can cancel jobs.

Example 6-4 kills job 415 and all its jobsteps.

Example 6-4: Killing a Job by Its JobID

$ scancel 415

Example 6-5 cancels all pending jobs.

Example 6-5: Cancelling All Pending Jobs

$ scancel --state=PENDING

Example 6-6 sends the

TERM

signal to terminate jobsteps 421.2 and 421.3.

Example 6-6: Sending a Signal to a Job

$ scancel --signal=TERM 421.2 421.3

6.7 Getting System Information with the

sinfo

Command

The

sinfo

command reports the state of partitions and nodes managed by SLURM. It has

a wide variety of filtering, sorting, and formatting options.

sinfo

displays a summary of

available partition and node (not job) information (such as partition names, nodes/partition,
and CPUs/node).

Example 6-7: Using the

sinfo

Command (No Options)

$ sinfo

PARTITION AVAIL TIMELIMIT NODES

STATE NODELIST

lsf

up

infinite

1

down* n15

lsf

up

infinite

2

idle n[14,16]

Using SLURM

6-13