beautypg.com
viii
9.25.1.1
Determining whether Voltaire InfiniBand interconnect is loaded ...........................................9-16
9.25.1.2
Starting, stopping, and auto-starting the Voltaire InfiniBand interconnect ...............................9-17
9.25.1.2.1
Using the ib-setup utility..............................................................................................9-17
9.25.1.2.2
From the command line..............................................................................................9-18
9.25.1.3
Server hangs when Voltaire InfiniBand interconnect is started ..............................................9-18
9.25.2 Voltaire HCA adapter is not recognized ................................................................................9-18
9.25.3 Voltaire HCA adapter is not activated ...................................................................................9-19
9.25.4 Connection and data transfer problems .................................................................................9-20
9.25.5 AD_TAVOR : vvi_mlx_poll_for_completion messages...............................................................9-20
9.26 Troubleshooting file systems......................................................................................................9-21
9.26.1 Problems creating a file system.............................................................................................9-21
9.26.2 Identifying servers serving OST services.................................................................................9-22
9.26.3 The start filesystem command may fail twice...........................................................................9-22
9.26.4 Troubleshooting the stop filesystem command.........................................................................9-23
9.26.5 Using the MPI Lustre repair utility to repair file systems ............................................................9-24
9.26.5.1
Using the repair-lfsck script and the generated file system-specific shell scripts .......................9-25
9.26.5.1.1
Running a generated file system repair script ................................................................9-26
9.26.5.2
Repairing or verifying individual MDS or OST services .......................................................9-27
9.26.6 MDS or OST services stay in the recovering state....................................................................9-28
9.26.7 MDS and OST service recovery process ................................................................................9-28
9.26.8 Rebalancing file system services ...........................................................................................9-30
9.26.9 Troubleshooting supplementary groups access........................................................................9-31
9.27 Troubleshooting file system performance ....................................................................................9-32
9.27.1 Performance troubleshooting ................................................................................................9-32
9.27.2 Verifying file striping ...........................................................................................................9-36
9.27.2.1
Recreating files ..............................................................................................................9-38
9.27.3 Checking for unbalanced distribution of OST services .............................................................9-39
9.27.4 Checking for unbalanced controllers in EVA4000 arrays.........................................................9-40
9.27.5 Examining the system logs for errors .....................................................................................9-41
9.27.6 Examining EVA4000 storage subsystems for errors.................................................................9-42
9.27.7 Examining SFS20 storage subsystems for errors......................................................................9-42
9.27.8 Examining the interconnect switch for errors...........................................................................9-43
9.27.9 Verifying performance statistics on Fibre Channel switches ......................................................9-44
9.27.10 Troubleshooting slow commit messages .................................................................................9-46
9.28 Troubleshooting EVA4000 array connectivity..............................................................................9-47
9.29 Troubleshooting LUN presentation .............................................................................................9-49
9.30 Accessing consoles..................................................................................................................9-51
9.31 Accessing the iLO component ...................................................................................................9-51
9.31.1 Configuring the iLO component............................................................................................9-51
9.31.1.1
Using the remote console ................................................................................................9-51
9.31.1.2
Using a Web browser ....................................................................................................9-52
9.31.2 Troubleshooting iLO access..................................................................................................9-52
9.32 Troubleshooting licenses...........................................................................................................9-53
9.33 Troubleshooting failed SFS20 arrays..........................................................................................9-54
9.33.1 Identifying failed SFS20 arrays.............................................................................................9-54
9.33.2 Recovering from a temporary SFS20 array failure...................................................................9-56
9.33.3 Recovering degraded MDS or OST services...........................................................................9-56
9.34 Handling Disk Errors on SFS20 storage......................................................................................9-59
9.34.1 Disks showing the removed/failed state.................................................................................9-60
9.34.2 Disks showing the predict fail state........................................................................................9-60
9.34.3 Disks showing the logging errors state...................................................................................9-60
9.35 Recovering degraded MDS services on systems using EVA4000 storage........................................9-61
9.36 System log files .......................................................................................................................9-64
9.37 Administration service restarts every one minute (attempting to start the evlogd daemon)..................9-64
9.38 The MDS service fails with an ASSERTION(ino ==inode->i_ino) message .......................................9-65
9.39 The MDS service repeatedly crashes with an LBUG error..............................................................9-65
9.40 Rebuilding logical drives after disk failures .................................................................................9-66
9.41 Determining if the Network ID of a server on a Quadrics or Myrinet interconnect has been changed.9-68
9.42 Troubleshooting client mount failures..........................................................................................9-69