beautypg.com

10 using lsf-hpc, 11 advanced topics, A examples – HP XC System 3.x Software User Manual

Page 6

background image

9.3.3 Using the srun Command with LSF-HPC...............................................................................92

9.4 Monitoring Jobs with the squeue Command..................................................................................92
9.5 Terminating Jobs with the scancel Command.................................................................................93
9.6 Getting System Information with the sinfo Command...................................................................93
9.7 Job Accounting................................................................................................................................94
9.8 Fault Tolerance................................................................................................................................94
9.9 Security............................................................................................................................................94

10 Using LSF-HPC............................................................................................................95

10.1 Information for LSF-HPC..............................................................................................................95
10.2 Overview of LSF-HPC Integrated with SLURM...........................................................................96
10.3 Differences Between LSF-HPC and LSF-HPC Integrated with SLURM.......................................98
10.4 Job Terminology............................................................................................................................99
10.5 Using LSF-HPC Integrated with SLURM in the HP XC Environment.......................................101

10.5.1 Useful Commands...............................................................................................................101
10.5.2 Job Startup and Job Control.................................................................................................101
10.5.3 Preemption..........................................................................................................................101

10.6 Submitting Jobs............................................................................................................................101
10.7 LSF-SLURM External Scheduler..................................................................................................102
10.8 How LSF-HPC and SLURM Launch and Manage a Job.............................................................102
10.9 Determining the LSF Execution Host..........................................................................................104
10.10 Determining Available System Resources.................................................................................104

10.10.1 Examining System Core Status..........................................................................................105
10.10.2 Getting Information About the LSF Execution Host Node...............................................105
10.10.3 Getting Host Load Information.........................................................................................106
10.10.4 Examining System Queues................................................................................................106
10.10.5 Getting Information About the lsf Partition...................................................................106

10.11 Getting Information About Jobs................................................................................................107

10.11.1 Getting Job Allocation Information...................................................................................107
10.11.2 Examining the Status of a Job............................................................................................108
10.11.3 Viewing the Historical Information for a Job....................................................................109

10.12 Translating SLURM and LSF-HPC JOBIDs...............................................................................110
10.13 Working Interactively Within an Allocation..............................................................................111
10.14 LSF-HPC Equivalents of SLURM srun Options........................................................................114

11 Advanced Topics......................................................................................................117

11.1 Enabling Remote Execution with OpenSSH................................................................................117
11.2 Running an X Terminal Session from a Remote Node................................................................117
11.3 Using the GNU Parallel Make Capability...................................................................................119

11.3.1 Example Procedure 1...........................................................................................................121
11.3.2 Example Procedure 2...........................................................................................................121
11.3.3 Example Procedure 3...........................................................................................................122

11.4 Local Disks on Compute Nodes..................................................................................................122
11.5 I/O Performance Considerations.................................................................................................123

11.5.1 Shared File View..................................................................................................................123
11.5.2 Private File View..................................................................................................................123

11.6 Communication Between Nodes.................................................................................................123
11.7 Using MPICH on the HP XC System...........................................................................................123

11.7.1 Using MPICH with SLURM Allocation..............................................................................124
11.7.2 Using MPICH with LSF Allocation.....................................................................................124

A Examples....................................................................................................................125

A.1 Building and Running a Serial Application.................................................................................125

6

Table of Contents