beautypg.com

6 debugging applications, 7 monitoring node activity, 8 tuning applications – HP XC System 4.x Software User Manual

Page 5: 9 using slurm

background image

5.2 Submitting a Serial Job Using LSF...................................................................................................49

5.2.1 Submitting a Serial Job with the LSF bsub Command............................................................49
5.2.2 Submitting a Serial Job Through SLURM Only......................................................................50

5.3 Submitting a Parallel Job.................................................................................................................51

5.3.1 Submitting a Non-MPI Parallel Job.........................................................................................51
5.3.2 Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface.........................52
5.3.3 Submitting a Parallel Job Using the SLURM External Scheduler...........................................53

5.4 Submitting a Batch Job or Job Script...............................................................................................56
5.5 Submitting Multiple MPI Jobs Across the Same Set of Nodes........................................................58

5.5.1 Using a Script to Submit Multiple Jobs...................................................................................58
5.5.2 Using a Makefile to Submit Multiple Jobs..............................................................................58

5.6 Submitting a Job from a Host Other Than an HP XC Host.............................................................61
5.7 Running Preexecution Programs....................................................................................................61

6 Debugging Applications.............................................................................................63

6.1 Debugging Serial Applications.......................................................................................................63
6.2 Debugging Parallel Applications....................................................................................................63

6.2.1 Debugging with TotalView.....................................................................................................64

6.2.1.1 SSH and TotalView..........................................................................................................64
6.2.1.2 Setting Up TotalView......................................................................................................64
6.2.1.3 Using TotalView with SLURM........................................................................................65
6.2.1.4 Using TotalView with LSF...............................................................................................65
6.2.1.5 Setting TotalView Preferences.........................................................................................65
6.2.1.6 Debugging an Application..............................................................................................66
6.2.1.7 Debugging Running Applications..................................................................................67
6.2.1.8 Exiting TotalView............................................................................................................67

7 Monitoring Node Activity............................................................................................69

7.1 The Xtools Utilities..........................................................................................................................69
7.2 Running Performance Health Tests.................................................................................................70

8 Tuning Applications.....................................................................................................75

8.1 Using the Intel Trace Collector and Intel Trace Analyzer...............................................................75

8.1.1 Building a Program — Intel Trace Collector and HP-MPI......................................................75
8.1.2 Running a Program – Intel Trace Collector and HP-MPI.......................................................76

8.2 The Intel Trace Collector and Analyzer with HP-MPI on HP XC...................................................77

8.2.1 Installation Kit.........................................................................................................................77
8.2.2 HP-MPI and the Intel Trace Collector.....................................................................................77

8.3 Visualizing Data – Intel Trace Analyzer and HP-MPI....................................................................79

9 Using SLURM................................................................................................................81

9.1 Introduction to SLURM...................................................................................................................81
9.2 SLURM Utilities...............................................................................................................................81
9.3 Launching Jobs with the srun Command.......................................................................................81

9.3.1 The srun Roles and Modes......................................................................................................82

9.3.1.1 The srun Roles.................................................................................................................82
9.3.1.2 The srun Modes...............................................................................................................82

9.3.2 Using the srun Command with HP-MPI................................................................................82
9.3.3 Using the srun Command with LSF........................................................................................82

9.4 Monitoring Jobs with the squeue Command..................................................................................82
9.5 Terminating Jobs with the scancel Command.................................................................................83
9.6 Getting System Information with the sinfo Command...................................................................83

Table of Contents

5