7 monitoring gpus and coprocessors, 1 monitoring nvidia gpus, Colplot results – HP Insight Cluster Management Utility User Manual
Page 85
Select plotting options, then click Generate Plot.
Figure 43 ColPlot results
5.5.7 Monitoring GPUs and coprocessors
5.5.7.1 Monitoring NVIDIA GPUs
If your client nodes contain NVIDIA GPUs and are running version 270.xx.xx or newer of the
NVIDIA GPU driver, you can monitor your GPUs with HP Insight CMU.
If you haven’t done so already, install the NVIDIA GPU driver version 270.xx.xx or newer on your
client nodes. This can be done two ways:
1.
Install the NVIDIA GPU driver manually on one of the client nodes, backup the client image
and clone the remaining clients with this new image.
2.
Use the script /opt/cmu/contrib/install_nvidia.pl to install the NVIDIA GPU driver
on all running clients. For details, see the file /opt/cmu/contrib/
install_nvidia.README
.
To enable GPU monitoring, the /opt/cmu/etc/ActionAndAlertsFile.txt file must be
updated with entries for HP Insight CMU GPU monitoring. This is done by running the script /opt/
cmu/bin/cmu_config_nvidia
. This script takes the number of GPUs on each client as an
argument. The following example updates ActionAndAlertsFile.txt to monitor clients that
have 3 GPUs each. Monitoring must be restarted for the updates to take effect.
# cmu_config_nvidia 3
CMU GPU monitoring enables driver persistence mode on all GPUs and requires all GPU-enabled clients to be running
NVIDIA driver 270.xx.xx or newer. Continue only if an appropriate driver is installed on the clients and
persistence mode is permissible.
Continue? [y/n] y
Configuring GPU monitoring in CMU...
GPU monitoring configured successfully.
Copy of orignial /opt/cmu/etc/ActionAndAlertsFile.txt can found in
/opt/cmu/etc/ActionAndAlertsFile.txt_before_cmu_config_nvidia_config
Please restart CMU ('/etc/init.d/cmu restart') to enable these changes.
# /etc/init.d/cmu restart
5.5 Tuning HP Insight CMU monitoring
85