beautypg.com

10 using lsf, 11 advanced topics, A examples – HP XC System 4.x Software User Manual

Page 6

background image

9.7 Job Accounting................................................................................................................................84
9.8 Fault Tolerance................................................................................................................................84
9.9 Security............................................................................................................................................84

10 Using LSF....................................................................................................................85

10.1 Information for LSF.......................................................................................................................85
10.2 Overview of LSF Integrated with SLURM....................................................................................86
10.3 Differences Between LSF and LSF Integrated with SLURM.........................................................88
10.4 Job Terminology............................................................................................................................89
10.5 Using LSF Integrated with SLURM in the HP XC Environment..................................................91

10.5.1 Useful Commands.................................................................................................................91
10.5.2 Job Startup and Job Control...................................................................................................91
10.5.3 Preemption............................................................................................................................91

10.6 Submitting Jobs..............................................................................................................................91
10.7 LSF-SLURM External Scheduler....................................................................................................92
10.8 How LSF and SLURM Launch and Manage a Job........................................................................92
10.9 Determining the LSF Execution Host............................................................................................94
10.10 Determining Available System Resources...................................................................................94

10.10.1 Examining System Core Status............................................................................................95
10.10.2 Getting Information About the LSF Execution Host Node.................................................95
10.10.3 Getting Host Load Information...........................................................................................96
10.10.4 Examining System Queues..................................................................................................96
10.10.5 Getting Information About the lsf Partition.....................................................................96

10.11 Getting Information About Jobs..................................................................................................96

10.11.1 Getting Job Allocation Information.....................................................................................97
10.11.2 Examining the Status of a Job..............................................................................................98
10.11.3 Viewing the Historical Information for a Job......................................................................99

10.12 Translating SLURM and LSF JOBIDs.........................................................................................100
10.13 Working Interactively Within an Allocation..............................................................................101
10.14 LSF Equivalents of SLURM srun Options.................................................................................103

11 Advanced Topics......................................................................................................107

11.1 Enabling Remote Execution with OpenSSH................................................................................107
11.2 Running an X Terminal Session from a Remote Node................................................................107
11.3 Using the GNU Parallel Make Capability...................................................................................109

11.3.1 Example Procedure 1...........................................................................................................111
11.3.2 Example Procedure 2...........................................................................................................111
11.3.3 Example Procedure 3...........................................................................................................112

11.4 Local Disks on Compute Nodes..................................................................................................112
11.5 I/O Performance Considerations.................................................................................................113

11.5.1 Shared File View..................................................................................................................113
11.5.2 Private File View..................................................................................................................113

11.6 Communication Between Nodes.................................................................................................113
11.7 Using MPICH on the HP XC System...........................................................................................113

11.7.1 Using MPICH with SLURM Allocation..............................................................................114
11.7.2 Using MPICH with LSF Allocation.....................................................................................114

A Examples....................................................................................................................115

A.1 Building and Running a Serial Application.................................................................................115
A.2 Launching a Serial Interactive Shell Through LSF.......................................................................115
A.3 Running LSF Jobs with a SLURM Allocation Request.................................................................116

A.3.1 Example 1. Two Cores on Any Two Nodes..........................................................................116
A.3.2 Example 2. Four Cores on Two Specific Nodes....................................................................117

6

Table of Contents