Example 10-10 – HP XC System 3.x Software User Manual
Page 113
Example 10-10 Launching an Interactive MPI Job on All Cores in the Allocation
This example assumes 2 cores per node.
$ mpirun -srun --jobid=150 -n8 hellompi
Hello! I'm rank 0 of 8 on n1
Hello! I'm rank 1 of 8 on n1
Hello! I'm rank 2 of 8 on n2
Hello! I'm rank 3 of 8 on n2
Hello! I'm rank 4 of 8 on n3
Hello! I'm rank 5 of 8 on n3
Hello! I'm rank 6 of 8 on n4
Hello! I'm rank 7 of 8 on n4
Alternatively, you can use the following:
$ export SLURM_JOBID=150
$ export SLURM_NPROCS=8
$ mpirun -srun hellompi
Hello! I'm rank 0 of 8 on n1
Hello! I'm rank 1 of 8 on n1
Hello! I'm rank 2 of 8 on n2
Hello! I'm rank 3 of 8 on n2
Hello! I'm rank 4 of 8 on n3
Hello! I'm rank 5 of 8 on n3
Hello! I'm rank 6 of 8 on n4
Hello! I'm rank 7 of 8 on n4
Use
to launch a Totalview debugger session, assuming that TotalView is installed and licensed
and that ssh X forwarding is properly configured:
$ export SLURM_JOBID=150
$ export SLURM_NPROCS=4
$ mpirun -tv srun additional parameters as needed
After you finish with this interactive allocation, exit the /bin/bash process in the first terminal;
this ends the LSF job.
Note:
If you exported any variables, such as SLURM_JOBID and SLURM_NPROCS, be sure to unset
them as follows before submitting any further jobs from the second terminal:
$ unset SLURM_JOBID
$ unset SLURM_NPROCS
You do not need to launch the /bin/bash shell to be able to interact with any compute node
resources; any running job will suffice. This is excellent for checking on long-running jobs. For
example, if you had submitted a CPU-intensive job, you could execute the uptime command
on all nodes in the allocation to confirm an expected high load on the nodes. The following is an
example of this; the LSF JOBID is 200 and the SLURM JOBID is 250:
$ srun --jobid=250 uptime
If you are concerned about allocating the resources too long or leaving them allocated long after
you finished using them, you could submit a simple sleep job to limit the allocation time, as
follows:
$ bsub -n4 -ext "SLURM[nodes=4]" -o %J.out sleep 300
Job <125> is submitted to the default queue
After you confirm that this job is running using the bjobs -l 125 command, you can operate
interactively for with the resources. If LSF-HPC observes no SLURM activity within the allocation
after 5 minutes, it terminates the job. Any existing SLURM activity (including running MPI jobs
with the mpirun -srun command) is allowed to continue.
10.13 Working Interactively Within an Allocation
113