beautypg.com

7 errpt command, 8 hmc error logging, 9 multiple versions of mpi libraries – IBM pSeries User Manual

Page 21

background image

pshpstuningguidewp040105.doc

Page

21

On the HMC GUI, select Service Applications -> Service Focal Point -> Select Serviceable
Events.

5.7 errpt command


On AIX 5L, the errpt command lists a summary of system error messages. Some of the HPS
subsystem errors are collected by errpt. To find out if you have hardware errors, you can either
run the errpt command, or you can run the dsh command from the CSM manager:

dsh errpt | grep “ 0223” | grep sysplanar0 (The value

0223

is the month and day.)

You can also look at /var/adm/sni/sni_errpt_capture on the LPAR that is reporting the error.

If you see any errors from sni in the errpt listing, check the sni logs for more specific
information. The HPS logs are found in a set of directories under the /var/adm/sni directory.

5.8 HMC error logging

The HMC records errors in the /var/hsc/log directory. Here is an example of a command to
check for cyclical redundancy check (CRC) errors in the FNM_Recover.log:

grep -i evtsum FNM_Recov.log | grep -i crc


In general, if Service Focal Point is working properly, you should not need to check the low-level
FNM logs such as the FNM_Recov file. However, for completeness, these are additional FNM
logs on the HMC:

FNM_Comm.log
FNM_Ice.log
FNM_Init.log
FNM_Route.log

Another debug command you can run on the HMC is lsswtopol -n 1 -p $PLANE_NUMBER.
For example, run the following command to check the link status for plane 0:

lsswtopol -n 1 -p0

If the lsswtopol command calls out links as ”

service required

,” but these links do not

show up in Service Focal Point, contact IBM service.

5.9 Multiple versions of MPI libraries

One common problem on clustered systems is having different MPI library levels on various
nodes. This can occur when a node is down for service while an upgrade is made, or when there
are multiple versions of the libraries for each node and the links are broken. To check the library
levels across a large system, use the following dsh commands:

For LAPI libraries:

dsh sum /opt/rsct/lapi/lib/liblapi_r.a

(or run with

MP_INFOLEVEL=2)