HP XC System 3.x Software User Manual
Page 81
NOTE:
The --nodelist=nodelist option is particularly useful for
determining problematic nodes.
If you use this option and the --nnodes=n option, the --nnodes=n option is
ignored.
•
The --queue LSF_queue option specifies the LSF queue for the performance
health tests.
test
Indicates the test to perform. The following tests are available:
cpu
Tests CPU core performance using the Linpack
benchmark.
cpu_usage
Tests CPU core usage. All CPU cores should be
idle during the test. This test reports a node if it
is using more than 10% (by default) of its CPU
cores.
The head node is excluded from this test.
memory
Uses the streams benchmark to test memory
performance.
memory_usage
Tests memory usage. This test reports a node that
uses more than 25 percent (by default) of its
memory.
network_stress
Tests network performance. Check network
performance under stress using the Pallas
benchmark's Alltoall, Allgather, and Allreduce
tests. These tests should be performed on a large
number of nodes for the most accurate results.
The default value for the number of nodes is 4,
which is the minimum value that should be used.
The --all_group option allows you to select
the node grouping size.
network_bidirectional
Tests network performance between pairs of
nodes using the Pallas benchmark's Exchange
test.
network_unidirectional
Tests network performance between pairs of
nodes using the HP MPI ping_pong_ring test.
NOTE:
Except for the network_stress and network_bidirectional tests,
these tests only apply to systems that install LSF-HPC incorporated with SLURM.
The network_stress and network_bidirectional tests also function under
Standard LSF.
You can list the available tests with the ovp -l command:
$ ovp -l
Test list for perf_health:
cpu_usage
memory_usage
cpu
memory
7.6 Running Performance Health Tests
81