6 mds or ost services stay in the recovering state, 7 mds and ost service recovery process, 7 mds and ost service recovery process -28 – HP StorageWorks Scalable File Share User Manual

Page 252

Troubleshooting

9–28

Unmount the file system that uses the device on all client nodes.

The

/proc/fs/lustre/mds/*/num_export

s counter on the MDS server must be 0 (zero). This

counter is set internally when client nodes mount and unmount file systems.

Stop the file system by entering the command shown in the following example, where the file system is

called test:

sfs> stop filesystem test

Verify that the file system is stopped by entering the

show filesystem

command. The services must

be in the

stopped

down

state. If a service is in any other state, enter the

stop filesystem

command again or shut down the server where the service is running.

Check the

/proc/mounts

file and the

/proc/fs/lustre/obdfilter/mntdev

(or

/proc/

fs/lustre/mds/mntdev

) file on each server to ensure that the file system devices are not mounted

on any point.

CAUTION:

Before you run the

e2fsck-lfsck

command on a file system device, you must

ensure that the file system that uses the device is fully stopped and the device is not mounted on

any node. Running the

e2fsck-lfsck

command on a file system device that is mounted or

actively being used may corrupt the file system.

show ost ost_name

command to identify

the preferred server for an OST service.)

If the service is not mirrored, skip this step.

If the service is mirrored, identify the underlying RAID device and start it, as shown in the following

example:
a.

Examine the

raidtab

file used to create the mirrored service, as shown in the following

example:

# cat /var/raid/raidtab.mds3
ARRAY /dev/md1
.

In this example, the underlying RAID device is

/dev/md1

Start the underlying RAID device by entering the following command:

# mdadm --assemble --config /var/raid/raidtab.mds3 /dev/md10

You can now run the

e2fsck-lfsck

command manually using the same arguments as would be used by

the standard

e2fsck

command to repair a standard

ext3/ldiskfs

file system.

9.26.6 MDS or OST services stay in the recovering state

If you find that an MDS or OST service is remaining in the

recovering

state for a long time, check whether

the service has actually started the recovery process, as described in Section 9.26.7.

9.26.7 MDS and OST service recovery process

This section describes the process that takes place when a service fails over from a server to the peer server.

When an MDS or OST service starts up and determines that it has client recovery information that

was recorded during an earlier operation of the service, it reports messages similar to the following:

Lustre: OST south-ost12 now serving /dev/hpls/dev17a (e6cf0bf5-a180-46d5-b6ea-
cea8947013c1), but will be inrecovery until 9 clients reconnect, or if no clients
reconnect for 5:00; during that time new clients will not be allowed to connect.
Recovery progress can be monitored by watching /proc/fs/lustre/obdfilter/south-
ost12/recovery_status.