beautypg.com

HP XC System 3.x Software User Manual

Page 64

background image

1

This line attempts to submit
a program that does not
exist.

The following command line makes the program and executes it:

$ bsub -o %J.out -n2 -ext "SLURM[nodes=2]" make -j3 \
-f ./mymake PPR_ARGS=100000
Job <117> is submitted to default queue .

The output file contains error messages related to the attempt to launch the nonexistent program.

$ cat 117.out

.
.
.

mpirun -srun -N 2 -n 4 ./ping_pong_ring 100000
mpirun -srun -N 2 -n 4 ./ping_pong_ring 100000
mpirun -srun -N 2 -n 4 ./ping_bogus 100000
slurmstepd: [19.3]: error: execve(): ./ping_bogus: No such file or directory
slurmstepd: [19.3]: error: execve(): ./ping_bogus: No such file or directory
srun: error: n14: task0: Exited with exit code 2
srun: Terminating job
slurmstepd: [19.3]: error: execve(): ./ping_bogus: No such file or directory
slurmstepd: [19.3]: error: execve(): ./ping_bogus: No such file or directory
make: *** [run3] Error 2
make: *** Waiting for unfinished jobs....
[0:n14] ping-pong 100000 bytes ...
100000 bytes: 99.06 usec/msg
100000 bytes: 1009.51 MB/sec
[0:n14] ping-pong 100000 bytes ...
100000 bytes: 99.76 usec/msg
100000 bytes: 1002.43 MB/sec
[1:n14] ping-pong 100000 bytes ...
100000 bytes: 1516.83 usec/msg
100000 bytes: 65.93 MB/sec
[1:n14] ping-pong 100000 bytes ...
100000 bytes: 1519.73 usec/msg
100000 bytes: 65.80 MB/sec
[2:n15] ping-pong 100000 bytes ...
100000 bytes: 108.65 usec/msg
100000 bytes: 920.38 MB/sec
[2:n15] ping-pong 100000 bytes ...
100000 bytes: 99.44 usec/msg
100000 bytes: 1005.65 MB/sec
[3:n15] ping-pong 100000 bytes ...
100000 bytes: 1877.35 usec/msg
100000 bytes: 53.27 MB/sec
[3:n15] ping-pong 100000 bytes ...
100000 bytes: 1888.22 usec/msg
100000 bytes: 52.96 MB/sec

The sacct command, which displays SLURM accounting information, reflects the error:

[lsfadmin@n16 ~]$ sacct -j 19
Jobstep Jobname Partition Ncpus Status Error
---------- ------------------ ---------- ------- ---------- -----
19 hptclsf@117 lsf 8 CANCELLED 2
19.0 hptclsf@117 lsf 0 FAILED 2
19.1 hptclsf@117 lsf 8 COMPLETED 0
19.2 hptclsf@117 lsf 8 COMPLETED 0
19.3 hptclsf@117 lsf 8 FAILED 2

64

Submitting Jobs