42 troubleshooting client mount failures, 42 troubleshooting client mount failures -69 – HP StorageWorks Scalable File Share User Manual

Page 293

Troubleshooting client mount failures

9–69

9.42 Troubleshooting client mount failures

When a node is booting, you can monitor the progress on the console. When the SFS service starts on the

node, it prints a message similar to the following:

Mounting sfs filesystems: sfsalias/scratch

If all the file systems mount successfully, the SFS service exits and the message changes to something similar

to the following:

Mounting sfs filesystems: sfsalias/scratch sfsalias/workspace sfsalias/data [ OK ]

If a file system is being mounted in the background, the SFS service displays a

[WARNING]

code when it

exits.

If a mount operation stalls, you can troubleshoot the problem as follows:

# ssh sfsalias

Start the SFS CLI and run the

show filesystem

command as follows:

# sfsmgr
.

sfs> show filesystem
Name State Services
------------- -------------- ----------------------------------
data started mds4: running, ost[145-184]: running
scratch started mds5: running, ost[185-188]: running

Examine the output from the command, and consider the following factors:

•

If any of the file system services are not in the

running

recovering

state, there is a

problem with the file system. You must correct the problem before proceeding; otherwise, the file

system cannot be mounted on any node.

In some cases, the output from the

show filesystem

command may indicate that there is a

problem with the service, but that the service is still running. For example, a service in the

data

file system might show a status similar to the following:

running(raid: degraded)

This indicates that the service is mirrored over two arrays, and that one of the arrays has failed.

In this case, although the situation needs attention, the service is running and the problem will

not affect mount operations.

•

If all file system services are in the

running

state, you can expect mount operations to complete

in relatively short times. However there are several factors that might extend the mount time to

several minutes, including the following:

•

One or more MDS or OST services are running on their backup server. The mount

operation first attempts to connect to the preferred server for the service. If this attempt fails

because the primary server is down, Lustre attempts to connect to the backup server after

approximately 50 seconds.

•

An InfiniBand port is not in the

PORT_ACTIVE

state.

In this case, the SFS service does not attempt to mount the file system; instead, it polls the

status of the port every few seconds. If the mount operation is taking place in the

background (that is, the

option was specified for the mount operation), the poll takes

place in the background. However, if the mount operation is taking place in the

foreground, the boot process stalls in the SFS service until InfiniBand has finished

initializing.