Cpu binding considerations, Single tcp connection performance settings – Dell Emulex Family of Adapters User Manual
Page 661

Emulex Drivers for Windows User Manual
P010077-01A Rev. A
3. Configuration
NIC Driver Configuration
661
Some applications run slower with interrupt coalescing enabled, such as applications
that depend on the completion of the current network transfer before they post
additional work. If an application sends and receives one network message before
posting the next message, it is considered latency bound. For latency bound
applications, an interrupt is required to proceed to the next work item, so reducing the
number of interrupts directly reduces the network throughput. The Microsoft iSCSI
Initiator is generally considered a latency bound application unless the I/O sizes are
very large.
When tuning the system, you must balance the extra CPU usage caused by interrupts
with the potential decrease in total throughput for latency bound applications.
CPU Binding Considerations
Windows applications may set a processor affinity, which binds a program to a
particular CPU in a multiple processor computer. However, with the recent additions
to the Windows networking stack, manually configuring CPU affinity is not
recommended.
The advantage of application affinity for network applications is based on choosing the
ideal relationship between the DPC and application affinity to reduce processor-cache
coherency cycles. The ideal mapping may require that both the DPC and application
run on the same processor, different processors, or different cores of a dual-core
processor that share a common memory cache. Even when the best affinity relationship
is determined, it is impossible to enforce this relationship because RSS or TCP
offloading choose the DPC processor.
The driver uses multiple parallel DPCs that are explicitly assigned to particular CPUs
for processing both RSS and TCP offloading tasks. Each TCP connection is assigned to a
particular CPU for processing. This provides the advantage of assigning CPU affinities
by reducing CPU cache misses, without any user configuration.
Explicit processor affinity assignments are not necessary for the driver because the
advantages of assigning processor affinities are realized by using RSS. The only reason
to experiment with application and interrupt CPU affinity is when performing isolated
networking benchmarks.
Single TCP Connection Performance Settings
One common benchmark is to run a single TCP connection between two computers as
fast as possible. The following are a few suggestions to deliver the best possible
performance:
Use TCP window scaling with a 256 Kb or 512 Kb window. This may be
controlled with show socket applications, such as ntttcp from Microsoft.
Use send and receive buffers that are larger than 128 Kb with an efficient
application such as ntttcp.
Disable RSS and use an interrupt filter driver. Experiment with all relative CPU
affinities to find the best combination.
Disable timestamps and SACK, because the test should run without dropping
any packets.