beautypg.com

1 examining and troubleshooting, 1 on the server – HP StorageWorks Scalable File Share User Manual

Page 53

background image

5.7.1 Examining and Troubleshooting

If your file system is not operating properly, you can refer to information in the Lustre 1.8
Operations Manual
, PART III Lustre Tuning, Monitoring and Troubleshooting. Many important
commands for file system operation and analysis are described in the Part V Reference section,
including lctl, lfs, tunefs.lustre, and debugfs. Some of the most useful diagnostic and
troubleshooting commands are also briefly described below.

5.7.1.1 On the Server

Use the following command to check the health of the system.

# cat /proc/fs/lustre/health_check
healthy

This returns healthy if there are no catastrophic problems. However, other less severe problems
that prevent proper operation might still exist.

Use the following command to show the LNET network interface active on the node.

# lctl list_nids
172.31.97.1@o2ib

Use the following command to show the Lustre network connections that the node is aware of,
some of which might not be currently active.

# cat /proc/sys/lnet/peers
nid refs state max rtr min tx min queue
0@lo 1 ~rtr 0 0 0 0 0 0
172.31.97.2@o2ib 1 ~rtr 8 8 8 8 7 0
172.31.64.1@o2ib 1 ~rtr 8 8 8 8 6 0
172.31.64.2@o2ib 1 ~rtr 8 8 8 8 5 0
172.31.64.3@o2ib 1 ~rtr 8 8 8 8 5 0
172.31.64.4@o2ib 1 ~rtr 8 8 8 8 6 0
172.31.64.6@o2ib 1 ~rtr 8 8 8 8 6 0
172.31.64.8@o2ib 1 ~rtr 8 8 8 8 6 0

Use the following command on an MDS server or client to show the status of all file system
components, as follows. On an MGS or OSS server, it only shows the components running on
that server.

# lctl dl
0 UP mgc MGC172.31.103.1@o2ib 81b13870-f162-80a7-8683-8782d4825066 5
1 UP mdt MDS MDS_uuid 3
2 UP lov hpcsfsc-mdtlov hpcsfsc-mdtlov_UUID 4
3 UP mds hpcsfsc-MDT0000 hpcsfsc-MDT0000_UUID 195
4 UP osc hpcsfsc-OST000f-osc hpcsfsc-mdtlov_UUID 5
5 UP osc hpcsfsc-OST000c-osc hpcsfsc-mdtlov_UUID 5
6 UP osc hpcsfsc-OST000d-osc hpcsfsc-mdtlov_UUID 5
7 UP osc hpcsfsc-OST000e-osc hpcsfsc-mdtlov_UUID 5
8 UP osc hpcsfsc-OST0008-osc hpcsfsc-mdtlov_UUID 5
9 UP osc hpcsfsc-OST0009-osc hpcsfsc-mdtlov_UUID 5
10 UP osc hpcsfsc-OST000b-osc hpcsfsc-mdtlov_UUID 5
11 UP osc hpcsfsc-OST000a-osc hpcsfsc-mdtlov_UUID 5
12 UP osc hpcsfsc-OST0005-osc hpcsfsc-mdtlov_UUID 5
13 UP osc hpcsfsc-OST0004-osc hpcsfsc-mdtlov_UUID 5
14 UP osc hpcsfsc-OST0006-osc hpcsfsc-mdtlov_UUID 5
15 UP osc hpcsfsc-OST0007-osc hpcsfsc-mdtlov_UUID 5
16 UP osc hpcsfsc-OST0001-osc hpcsfsc-mdtlov_UUID 5
17 UP osc hpcsfsc-OST0002-osc hpcsfsc-mdtlov_UUID 5
18 UP osc hpcsfsc-OST0000-osc hpcsfsc-mdtlov_UUID 5
19 UP osc hpcsfsc-OST0003-osc hpcsfsc-mdtlov_UUID 5

Check the recovery status on an MDS or OSS server as follows:

# cat /proc/fs/lustre/*/*/recovery_status
INACTIVE

5.7 Testing Your Configuration

53