So section 9.39 f – HP StorageWorks Scalable File Share User Manual
Page 289

The MDS service fails with an ASSERTION(ino ==inode->i_ino) message
9–65
9.38 The MDS service fails with an ASSERTION(ino ==inode->i_ino)
message
In rare circumstances, the MDS service encounters a bug (caused by a client node) during the recovery
process. This causes the server where the MDS service is running (normally the MDS server) to crash with
an
LBUG
error. When this happens, events similar to the following are displayed in the event log:
LustreError: 11691:0:(mds_open.c:1013:mds_open()) uuid
92ea84f7-3a40-ac0a-74b5-4e0be3d3e3a3
LustreError: 11691:0:(mds_open.c:1016:mds_open()) ASSERTION(ino ==
inode->i_ino) failed
The value shown at the end of the first of the two events (in this example,
92ea84f7-3a40-ac0a-74b5-
4e0be3d3e3a3
) is the
UUID
identifier of the client node that caused the problem. Use this value to identify
the client node that caused the problem, by entering a command similar to the following on each client
node:
# lctl device_list | grep 92ea84f7
When you have identified the client node that caused the problem, reboot the client node. After the client
node has rebooted, the MDS service can be successfully restarted.
9.39 The MDS service repeatedly crashes with an LBUG error
In rare circumstances, the MDS service encounters a bug during the recovery process. This causes the server
where the MDS service is running (normally the MDS server) to crash with an
LBUG
error. However, when
the MDS service attempts to fail over to the peer server (usually the administration server), the peer server
also crashes. In the meantime, the first server reboots, but may crash again when the MDS service fails back
to the server. This cycle can continue indefinitely.
If this happens, perform the following steps:
1.
Disable the administration server; this will stop the administration server from repeatedly crashing. For
example, to disable the
south1
server, enter the following command (on any server in the HP SFS
system). Note that you must specify the
force=yes
option with the command, because the server
may be rebooting at the time you attempt to execute the command:
sfs> disable server south1 force=yes
The administration server may crash one more time before this command has time to take effect, but
will then reboot and become stable.
2.
Examine the event log (using the
sfsmgr show log
command) to determine the cause of the
LBUG
error. An event with
LBUG
in the text describes the point at which the service failed. Just before this
event, there will probably be a
LustreError
message reporting an assert.
An
LBUG
error can be caused by a number of problems; search this guide and the HP StorageWorks
Scalable File Share Release Notes to see if the problem that caused the
LBUG
error on your system is
a known problem and if further information is provided on dealing with the problem. In particular, see
Section 9.38 of this guide, which deals with one specific known problem that can cause an
LBUG
error.
At this point, the MDS service may recover normally on the MDS server, and no further action may be
needed. However, it is possible that the MDS server may continue to crash with the same
LBUG
error. If this
happens, continue with the remaining steps in this section; do not enable the administration server until you
have completed theses steps.
3.
Stop all file systems.
4.
Reboot the MDS server.