8 troubleshooting nagios problems – HP Insight Control Software for Linux User Manual

Page 224

Status Information: Node / and /var free space

This entry typically displays the status of the /, /var, and /hptc_cluster file systems on the system.

A warning or critical message indicates that the thresholds for the specific managed system were
exceeded.

Clean up disk space.

25.14.6 A check_nrpe error occurs during management agents installation

When the gather_all_data script is running, a check_nrpe error like the following is reported:

check_nrpe error: Connection refused by host => server

Corrective Actions:

•

If the check_nrpe error is reported for the CMS, use the following commands to verify that
the nrpe service is running on the CMS:

# ps auxww | grep nrpe

If the nrpe service is not running, use the following commands to start it and to rerun the
gather_all_data

script:

# /etc/init.d/nagios start_nrpe
# /opt/hptc/nagios/libexec/gather_all_data --verbose

•

If the output reports that vars.ini have been resynchronized for a managed system,
verify that there is a self-signed certificate for the Apache service and that that service is
running. For troubleshooting information on the Apache service, see

Section 25.3 (page 202)

25.14.7 Nagios gather_all_data script reports check_nrpe errors

These errors include socket timeouts and refused connections. The nrpe daemon is unable to
configure the server because the check_nagios_vars script is unable to write vars.ini to
the server.

Use the ping command, specifying the server by name; it returns its IP address. Compare that IP
address to the IP address that HP SIM reports for that server. They must match.

25.14.8 Troubleshooting Nagios problems

The following table describes possible causes of problems related to Nagios and provides actions
to correct them.

Corrective Actions

Cause/Symptom

Follow these steps to start the Nagios daemons manually:

Nagios fails to start

If Nagios fails to start, one or more Nagios daemons did
not start on the CMS.

Stop the Nagios service:

# /etc/init.d/nagios stop

Change to the following directory:

# cd /opt/hptc/nagios/bin

Start the Nagios service:

# ./nagios -v ../etc/nagios_local.cfg

Restart the Apache service on the CMS:

Nagios startup error: Can't find /nagios/cgi-bin/status.cgi.

Log into the CMS as root and use the following commands
for RHEL operating systems:

# /etc/init.d/httpd stop
# /etc/init.d/httpd start

Log into the CMS as root and use the following commands
for SLES:

224 Troubleshooting