beautypg.com

7 testing your configuration, 1 examining and troubleshooting, 1 on the server – HP StorageWorks Scalable File Share User Manual

Page 55

background image

If you cannot start a resource on a node, check that node for values of -INFINITY in
/var/lib/heartbeat/crm/cib.xml

. There should be none. For more details, see the

crm_resource

manpage. See also

http://www.linux-ha.org/Heartbeat

.

5.7 Testing Your Configuration

The best way to test your Lustre file system is to perform normal file system operations, such as
normal Linux file system shell commands like df, cd, and ls. If you want to measure performance
of your installation, you can use your own application or the standard file system performance
benchmarks described in Chapter 17 Benchmarking of the Lustre 1.8 Operations Manual at:

http://manual.lustre.org/images/7/7f/820-3681_v1_1.pdf

.

5.7.1 Examining and Troubleshooting

If your file system is not operating properly, you can refer to information in the Lustre 1.8
Operations Manual
, PART III Lustre Tuning, Monitoring and Troubleshooting. Many important
commands for file system operation and analysis are described in the Part V Reference section,
including lctl, lfs, tunefs.lustre, and debugfs. Some of the most useful diagnostic and
troubleshooting commands are also briefly described below.

5.7.1.1 On the Server

Use the following command to check the health of the system.

# cat /proc/fs/lustre/health_check
healthy

This returns healthy if there are no catastrophic problems. However, other less severe problems
that prevent proper operation might still exist.

Use the following command to show the LNET network interface active on the node.

# lctl list_nids
172.31.97.1@o2ib

Use the following command to show the Lustre network connections that the node is aware of,
some of which might not be currently active.

# cat /proc/sys/lnet/peers
nid refs state max rtr min tx min queue
0@lo 1 ~rtr 0 0 0 0 0 0
172.31.97.2@o2ib 1 ~rtr 8 8 8 8 7 0
172.31.64.1@o2ib 1 ~rtr 8 8 8 8 6 0
172.31.64.2@o2ib 1 ~rtr 8 8 8 8 5 0
172.31.64.3@o2ib 1 ~rtr 8 8 8 8 5 0
172.31.64.4@o2ib 1 ~rtr 8 8 8 8 6 0
172.31.64.6@o2ib 1 ~rtr 8 8 8 8 6 0
172.31.64.8@o2ib 1 ~rtr 8 8 8 8 6 0

Use the following command on an MDS server or client to show the status of all file system
components, as follows. On an MGS or OSS server, it only shows the components running on
that server.

# lctl dl
0 UP mgc MGC172.31.103.1@o2ib 81b13870-f162-80a7-8683-8782d4825066 5
1 UP mdt MDS MDS_uuid 3
2 UP lov hpcsfsc-mdtlov hpcsfsc-mdtlov_UUID 4
3 UP mds hpcsfsc-MDT0000 hpcsfsc-MDT0000_UUID 195
4 UP osc hpcsfsc-OST000f-osc hpcsfsc-mdtlov_UUID 5
5 UP osc hpcsfsc-OST000c-osc hpcsfsc-mdtlov_UUID 5
6 UP osc hpcsfsc-OST000d-osc hpcsfsc-mdtlov_UUID 5
7 UP osc hpcsfsc-OST000e-osc hpcsfsc-mdtlov_UUID 5
8 UP osc hpcsfsc-OST0008-osc hpcsfsc-mdtlov_UUID 5
9 UP osc hpcsfsc-OST0009-osc hpcsfsc-mdtlov_UUID 5

5.7 Testing Your Configuration

55