HP StorageWorks Scalable File Share User Manual
Page 173
Managing email alerts
6–43
lustre_bug
Alerts you when a fault
occurs in the Lustre
software.
facility=kern &&
data contains
"LustreError" &&
data contains "LBUG"
The server where the fault
occurred normally reboots
automatically. If this does
not happen, reboot the
server.
See also Section 9.39 for
information on handling
LBUG
errors on the MDS
node.
ost_out_of_space
Alerts you when the
percentage of space on an
OST service exceeds the
value of the
ost_critical_size
attribute (see Section 5.11
for more information).
The alert has a throttle
period of 86400 set by
default.
facility=storage &&
data contains "OST
out of space
critical condition"
Delete files from the file
system to prevent this
problem.
raid_degraded
Alerts you when a LUN
that is a component of a
mirrored LUN fails.
facility=kern &&
data contains "raid
degraded"
See Section 9.33.3 (SFS20
arrays) or Section 9.35
(EVA4000 arrays).
restart_fs
Alerts you to situations
where a system parameter
or other condition has
changed.
It is normal for this alert to
be triggered while you are
following the procedures
described in Chapter 7 to
change system
parameters. This alert can
also happen if a hardware
change occurs that
changes the file system
configuration.
facility=lustre &&
data contains
"Please restart
filesystem"
Review Chapter 7 and
Chapter 8 to verify that you
have followed the correct
procedure to change a
system parameter or
hardware component. See
also Section 4.1.1.
Stop and then restart all file
systems.
Client node must remount
all file systems.
server_down
Alerts you when a server is
not in the
running
state.
facility=server &&
data contains "Down"
If the server has crashed, it
will normally reboot
automatically. If this does
not happen, reboot the
server.
service_lun
Alerts you when a service
LUN is not available.
data contains "IO
error reading quorum
partition"
The most common cause of
this failure is that the array
where the LUN is located is
hung. Power cycle the array
where the LUN is located
and then reboot the servers
attached to the array.
sm_down
Alerts you when
communication within the
HP SFS system is slowing
down. This problem occurs
occasionally when the
system is under heavy
load.
data contains "SM
non-responsive"
If this problem occurs
infrequently, it can be
ignored. However, if the
problem occurs frequently, it
may indicate a problem
with the SFS20 array where
the service LUN for the
specified server is located.
Use the
show array
and
the
show array
array_number
commands to check the
SFS20 disks for errors.
Table 6-4
Default email alerts
Email Alert Name
Purpose
Email Alert Filter
Action Required