Dell Emulex Family of Adapters User Manual
Page 658

Emulex Drivers for Windows User Manual
P010077-01A Rev. A
3. Configuration
NIC Driver Configuration
658
TCP timers, including delayed ACK, push, retransmit and keep alive, are
implemented in hardware. This reduces host CPU usage.
Retransmits are handled entirely in hardware.
Packetizing data, including segmenting, checksums, and CRC, is supported.
The network driver should use send and receive buffers that are larger than 1
MB for maximum efficiency.
The driver provides efficient parallel processing of multiple connections TCP on
multiple CPU systems.
The adapter receive path is zero-copy for applications that prepost receive buffers or
that issue a socket read before the data arrives. Ideal applications use Microsoft's
Winsock2 Asynchronous Sockets API, which allows posting multiple receive buffers
with asynchronous completions, and posting multiple send operations with
asynchronous completions. Applications that do not prepost receive buffers may incur
the penalty of the data copy, and the performance improvement is significantly less
noticeable.
Applications that transmit large amounts of data show excellent CPU efficiency using
TCP offload. TCP offload allows the network driver to accept large buffers of data to
transmit. Each buffer is roughly the same amount of processing work as a single TCP
packet for non-offloaded traffic. The entire process of packetizing the data, processing
the incoming data acknowledgements, and potentially retransmitting any lost data is
handled by the hardware.
TCP Offload Exclusions
Microsoft provides a method to exclude certain applications from being offloaded to
the adapter. There are certain types of applications that do not benefit effectively from
TCP offload. These include TCP connections that are short-lived, transfer small
amounts of data at a time, exhibit fragmentation from end to end, or make use of IP
options.
If an application sends less data than the MSS, the driver, like most TCP stacks, uses a
Nagling algorithm. Nagling reduces the number of TCP packets on the network by
combining small application sends into one larger TCP packet. Nagling typically
reduces the performance of a single connection to allow greater overall performance for
a large group of connections.
During Nagling, a single connection may have long pauses (200 ms) between sending
subsequent packets, as the driver waits for more data from the application to append to
the packet. An application can disable Nagling using the TCP_NO_DELAY parameter.
TCP offload does not improve the performance for connections that Nagle, because the
performance is intentionally limited by the Nagling algorithm. Telnet and SSH consoles
are examples of connections that typically use Nagling.
Windows Server has not optimized the connection offload path. Some applications that
use numerous short-lived TCP connections do not show a performance improvement
using TCP offload.
Windows Server provides control over the applications and TCP ports that are eligible
for TCP offload using the netsh tool. Refer to the Microsoft documentation for these
netsh commands: