beautypg.com

5 terminating jobs with the scancel command, Displaying queued jobs by their jobids, Reporting on failed jobs in the queue – HP XC System 4.x Software User Manual

Page 83: Terminating a job by its jobid, Cancelling all pending jobs, Sending a signal to a job, Using the sinfo command (no options), Example 9-2

background image

Example 9-2 Displaying Queued Jobs by Their JobIDs

$ squeue --jobs 12345,12346
JOBID PARTITION NAME USER ST TIME_USED NODES NODELIST(REASON)
12345 debug job1 jody R 0:21 4 n[9-12]
12346 debug job2 jody PD 0:00 8

The squeue command can report on jobs in the job queue according to their state; possible states
are: pending, running, completing, completed, failed, timeout, and node_fail.

Example 9-3

uses the squeue command to report on failed jobs.

Example 9-3 Reporting on Failed Jobs in the Queue

$ squeue --state=FAILED
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
59 amt1 hostname root F 0:00 0

9.5 Terminating Jobs with the scancel Command

The scancel command cancels a pending or running job or job step. It can also be used to send
a specified signal to all processes on all nodes associated with a job. Only job owners or
administrators can cancel jobs.

Example 9-4

terminates job #415 and all its jobsteps.

Example 9-4 Terminating a Job by Its JobID

$ scancel 415

Example 9-5

cancels all pending jobs.

Example 9-5 Cancelling All Pending Jobs

$ scancel --state=PENDING

Example 9-6

sends the TERM signal to terminate jobsteps 421.2 and 421.3.

Example 9-6 Sending a Signal to a Job

$ scancel --signal=TERM 421.2 421.3

9.6 Getting System Information with the sinfo Command

The sinfo command reports the state of partitions and nodes managed by SLURM. It has a
wide variety of filtering, sorting, and formatting options. The sinfo command displays a
summary of available partition and node (not job) information, such as partition names,
nodes/partition, and cores/node.

Example 9-7 Using the sinfo Command (No Options)

$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 3 down* n[0,5,8]
lsf up infinite 14 idle n[1-4,6-7,9-16]

The node STATE codes in these examples may be appended by an asterisk character (*) ; this
indicates that the reported node is not responding. See the sinfo(1) manpage for a complete listing
and description of STATE codes.

9.5 Terminating Jobs with the scancel Command

83