beautypg.com

Turbo boost, Node interleaving – Dell PowerEdge 1655MC User Manual

Page 12

background image

Optimal BIOS settings for HPC with Dell PowerEdge 12

th

generation servers

12

The Performance Optimized System Profile focuses on pure performance. Turbo Boost is enabled;
C States and C1E are disabled.

The Dense Configuration Optimized profile is for systems that have high DIMM count
configurations, where reliability is prioritized over power savings or performance considerations.
Performance options like Turbo Boost are disabled, and memory-based options are prioritized.

The performance and energy efficiency of the four preset System Profiles are discussed in Section
5.2.

The Custom setting is for use cases where the canned profiles do not meet the application
requirements. One example is for low-latency environment like High Frequency Trading. This
option is covered in Section 3.5.

3.2.

Turbo Boost

The Turbo boost option can be tuned from the BIOS System Profile menu as described in Section
3.1.

Turbo boost

9

is a feature that was introduced in the Intel Xeon 5500 series processor (code named

Nehalem, supported in the Dell’s previous 11

th

generation servers). When Turbo boost is enabled, it

can provide improved performance by increasing the CPU frequency over the base operating
frequency. However, even when Turbo boost is enabled, it is engaged only when there is available
power headroom and the system is operating below power, current, and temperature specification
limits.

The Sandy Bridge processors use the second generation of this technology, Intel Turbo Boost
technology 2.0. The enhancements made in this generation should improve Turbo residency, i.e.,
how often and how long a core engages in Turbo, when compared to past generations. Within the
Intel Xeon E5-2600 processor family, some SKUs include the Turbo boost technology and some do
not. The exact processor model will determine if Turbo boost technology is available.

7, 8

3.3.

Node interleaving

As described in Section 2, the Intel Xeon E5-2600 processors are based on the Non-Uniform Memory
Access (NUMA) architecture and have an integrated memory controller. Access to the memory
channels directly connected to the processor is considered local. Access to the memory channels
connected to the other processor is remote access. Local access is faster than remote access
making the memory access non-uniform. Figure 2 shows that memory bandwidth to remote memory
is 43% lower than access to local memory for a single thread since every remote access needs to
traverse the QPI links between the sockets.

The BIOS provides an option to stripe memory access across the two memory controllers in the dual
socket system. This makes the memory equidistant from both memory controllers, making the
access time uniform. This option can be set by enabling Node Interleaving from the BIOS Memory
Settings menu. Figure 2 shows that memory bandwidth to interleaved is 26% lower than access to
local memory for a single thread. This option is useful for cases where the data set needed by a
process will not fit into local memory, or if the application spawns more threads than can fit on the
local socket. With memory interleaved across the two memory controllers, the worst-case scenario
of a remote memory access for every data request can be avoided.