6 debugging applications, 7 monitoring node activity, 8 tuning applications – HP XC System 3.x Software User Manual
Page 5: 9 using slurm, 10 using lsf-hpc
5.4 Submitting a Batch Job or Job Script.....................................................................................................53
5.5 Submitting a Job from a Host Other Than an HP XC Host......................................................................55
5.6 Running Preexecution Programs..........................................................................................................56
6.2.1.1 SSH and TotalView..............................................................................................................58
6.2.1.2 Setting Up TotalView...........................................................................................................58
6.2.1.3 Using TotalView with SLURM..............................................................................................58
6.2.1.4 Using TotalView with LSF-HPC...........................................................................................59
6.2.1.5 Setting TotalView Preferences..............................................................................................59
6.2.1.6 Debugging an Application...................................................................................................59
6.2.1.7 Debugging Running Applications........................................................................................60
6.2.1.8 Exiting TotalView................................................................................................................61
7.1 Installing the Node Activity Monitoring Software.................................................................................63
7.2 Using the xcxclus Utility to Monitor Nodes...........................................................................................63
7.3 Plotting the Data from the xcxclus Datafiles..........................................................................................65
7.4 Using the xcxperf Utility to Display Node Performance.........................................................................66
7.5 Plotting the Node Performance Data....................................................................................................67
7.6 Running Performance Health Tests.......................................................................................................68
8.1.1 Building a Program — Intel Trace Collector and HP-MPI...............................................................73
8.1.2 Running a Program – Intel Trace Collector and HP-MPI.................................................................74
9.4 Monitoring Jobs with the squeue Command.........................................................................................80
9.5 Terminating Jobs with the scancel Command........................................................................................81
9.6 Getting System Information with the sinfo Command...........................................................................81
9.7 Job Accounting...................................................................................................................................81
9.8 Fault Tolerance...................................................................................................................................82
9.9 Security..............................................................................................................................................82
10.1 Information for LSF-HPC...................................................................................................................83
10.2 Overview of LSF-HPC Integrated with SLURM...................................................................................84
10.3 Differences Between LSF-HPC and LSF-HPC Integrated with SLURM..................................................85
10.4 Job Terminology................................................................................................................................86
Table of Contents
5