How esx/esxi numa scheduling works, How esx/esxi numa scheduling works 74 – VMware vSphere vCenter Server 4.0 User Manual
Page 74
There are many disadvantages to using such an operating system on a NUMA platform. The high latency of
remote memory accesses can leave the processors under-utilized, constantly waiting for data to be transferred
to the local node, and the NUMA connection can become a bottleneck for applications with high-memory
bandwidth demands.
Furthermore, performance on such a system can be highly variable. It varies, for example, if an application has
memory located locally on one benchmarking run, but a subsequent run happens to place all of that memory
on a remote node. This phenomenon can make capacity planning difficult. Finally, processor clocks might not
be synchronized between multiple nodes, so applications that read the clock directly might behave incorrectly.
Some high-end UNIX systems provide support for NUMA optimizations in their compilers and programming
libraries. This support requires software developers to tune and recompile their programs for optimal
performance. Optimizations for one system are not guaranteed to work well on the next generation of the same
system. Other systems have allowed an administrator to explicitly decide on the node on which an application
should run. While this might be acceptable for certain applications that demand 100 percent of their memory
to be local, it creates an administrative burden and can lead to imbalance between nodes when workloads
change.
Ideally, the system software provides transparent NUMA support, so that applications can benefit immediately
without modifications. The system should maximize the use of local memory and schedule programs
intelligently without requiring constant administrator intervention. Finally, it must respond well to changing
conditions without compromising fairness or performance.
How ESX/ESXi NUMA Scheduling Works
ESX/ESXi uses a sophisticated NUMA scheduler to dynamically balance processor load and memory locality
or processor load balance.
1
Each virtual machine managed by the NUMA scheduler is assigned a home node. A home node is one of
the system’s NUMA nodes containing processors and local memory, as indicated by the System Resource
Allocation Table (SRAT).
2
When memory is allocated to a virtual machine, the ESX/ESXi host preferentially allocates it from the
home node.
3
The NUMA scheduler can dynamically change a virtual machine's home node to respond to changes in
system load. The scheduler might migrate a virtual machine to a new home node to reduce processor load
imbalance. Because this might cause more of its memory to be remote, the scheduler might migrate the
virtual machine’s memory dynamically to its new home node to improve memory locality. The NUMA
scheduler might also swap virtual machines between nodes when this improves overall memory locality.
Some virtual machines are not managed by the ESX/ESXi NUMA scheduler. For example, if you manually set
the processor affinity for a virtual machine, the NUMA scheduler might not be able to manage this virtual
machine. Virtual machines that have more virtual processors than the number of physical processor cores
available on a single hardware node cannot be managed automatically. Virtual machines that are not managed
by the NUMA scheduler still run correctly. However, they don't benefit from ESX/ESXi NUMA optimizations.
The NUMA scheduling and memory placement policies in ESX/ESXi can manage all virtual machines
transparently, so that administrators do not need to address the complexity of balancing virtual machines
between nodes explicitly.
The optimizations work seamlessly regardless of the type of guest operating system. ESX/ESXi provides
NUMA support even to virtual machines that do not support NUMA hardware, such as Windows NT 4.0. As
a result, you can take advantage of new hardware even with legacy operating systems.
vSphere Resource Management Guide
74
VMware, Inc.