6 mds or ost services stay in the recovering state, 7 mds and ost service recovery process, 7 mds and ost service recovery process -28 – HP StorageWorks Scalable File Share User Manual
Page 252
Troubleshooting
9–28
2.
Unmount the file system that uses the device on all client nodes.
The
/proc/fs/lustre/mds/*/num_export
s counter on the MDS server must be 0 (zero). This
counter is set internally when client nodes mount and unmount file systems.
3.
Stop the file system by entering the command shown in the following example, where the file system is
called test:
sfs> stop filesystem test
4.
Verify that the file system is stopped by entering the
show filesystem
command. The services must
be in the
stopped
or
down
state. If a service is in any other state, enter the
stop filesystem
command again or shut down the server where the service is running.
5.
Check the
/proc/mounts
file and the
/proc/fs/lustre/obdfilter/mntdev
(or
/proc/
fs/lustre/mds/mntdev
) file on each server to ensure that the file system devices are not mounted
on any point.
CAUTION:
Before you run the
e2fsck-lfsck
command on a file system device, you must
ensure that the file system that uses the device is fully stopped and the device is not mounted on
any node. Running the
e2fsck-lfsck
command on a file system device that is mounted or
actively being used may corrupt the file system.
6.
Log in to the preferred server for the service. (Use the
show ost ost_name
command to identify
the preferred server for an OST service.)
7.
If the service is not mirrored, skip this step.
If the service is mirrored, identify the underlying RAID device and start it, as shown in the following
example:
a.
Examine the
raidtab
file used to create the mirrored service, as shown in the following
example:
# cat /var/raid/raidtab.mds3
ARRAY /dev/md1
.
.
.
In this example, the underlying RAID device is
/dev/md1
b.
Start the underlying RAID device by entering the following command:
# mdadm --assemble --config /var/raid/raidtab.mds3 /dev/md10
You can now run the
e2fsck-lfsck
command manually using the same arguments as would be used by
the standard
e2fsck
command to repair a standard
ext3/ldiskfs
file system.
9.26.6 MDS or OST services stay in the recovering state
If you find that an MDS or OST service is remaining in the
recovering
state for a long time, check whether
the service has actually started the recovery process, as described in Section 9.26.7.
9.26.7 MDS and OST service recovery process
This section describes the process that takes place when a service fails over from a server to the peer server.
1.
When an MDS or OST service starts up and determines that it has client recovery information that
was recorded during an earlier operation of the service, it reports messages similar to the following:
Lustre: OST south-ost12 now serving /dev/hpls/dev17a (e6cf0bf5-a180-46d5-b6ea-
cea8947013c1), but will be inrecovery until 9 clients reconnect, or if no clients
reconnect for 5:00; during that time new clients will not be allowed to connect.
Recovery progress can be monitored by watching /proc/fs/lustre/obdfilter/south-
ost12/recovery_status.