1 using mpich with slurm allocation, 2 using mpich with lsf allocation, Mpich wrapper script – HP XC System 3.x Software User Manual
Page 110
Verify with your system administrator that MPICH has been installed on your system. The HP XC System
Software Administration Guide provides procedures for setting up MPICH.
MPICH jobs must not run on nodes allocated to other tasks. HP strongly recommends that all MPICH jobs
request node allocation through either SLURM or LSF and that MPICH jobs restrict themselves to using
only those resources in the allocation.
Launch MPICH jobs using a wrapper script, such as the one shown in
. The following subsections
describe how to launch MPICH jobs from a wrapper script with SLURM or LSF, respectively. These
subsections are not full solutions for integrating MPICH with the HP XC System Software.
Figure 11-1 MPICH Wrapper Script
#!/bin/csh
srun csh -c 'echo `hostname`:2' | sort | uniq > machinelist
set hostname = `head -1 machinelist | awk -F: '{print $1}'`
ssh $hostname /opt/mpich/bin/mpirun options... -machinefile machinelist a.out
The wrapper script is based on the following assumptions:
•
Each node in the HP XC system contains two CPUs.
•
The current working directory is available on all nodes on which an MPICH job might run.
•
You provide the mpirun options that are appropriate to your requirements.
•
The executable file is named a.out.
•
The wrapper script has the appropriate permissions.
You need to modify the wrapper script accordingly if these assumptions are not true.
11.7.1 Using MPICH with SLURM Allocation
The SLURM-based allocation method uses the srun command to spawn a shell; the remote job is run from
within the shell, as shown here:
% srun -A options
1
% ./wrapper
2
% exit
3
NOTE:
This method assumes that the communication among nodes is performed using ssh and that
passwords are not required.
1
The srun -A command allocates the resources and spawns a new shell without starting a remote
job. For more information on the -A option, see srun(1) .
IMPORTANT:
Be sure that the number of nodes and processors in the srun command correspond
to the numbers specified in the wrapper script.
2
This command line executes the wrapper script to start the job on the allocated nodes.
3
After the MPICH job specified by the wrapper completes, the exit command terminates the shell
and releases the allocated nodes.
11.7.2 Using MPICH with LSF Allocation
The LSF-based allocation method uses a single bsub command to create an allocation, as shown here:
% bsub -I options... wrapper
The bsub command launches the wrapper script.
110
Advanced Topics