beautypg.com

Intel 253666-024US User Manual

Page 700

background image

3-654 Vol. 2A

MOVNTPD—Store Packed Double-Precision Floating-Point Values Using Non-Temporal

Hint

INSTRUCTION SET REFERENCE, A-M

MOVNTPD—Store Packed Double-Precision Floating-Point Values Using

Non-Temporal Hint

Description

Moves the double quadword in the source operand (second operand) to the destina-
tion operand (first operand) using a non-temporal hint to minimize cache pollution
during the write to memory. The source operand is an XMM register, which is
assumed to contain two packed double-precision floating-point values. The destina-
tion operand is a 128-bit memory location.
The non-temporal hint is implemented by using a write combining (WC) memory
type protocol when writing the data to memory. Using this protocol, the processor
does not write the data into the cache hierarchy, nor does it fetch the corresponding
cache line from memory into the cache hierarchy. The memory type of the region
being written to can override the non-temporal hint, if the memory address specified
for the non-temporal store is in an uncacheable (UC) or write protected (WP)
memory region. For more information on non-temporal stores, see “Caching of
Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architec-
tures Software Developer’s Manual, Volume 1
.
Because the WC protocol uses a weakly-ordered memory consistency model, a
fencing operation implemented with the SFENCE or MFENCE instruction should be
used in conjunction with MOVNTPD instructions if multiple processors might use
different memory types to read/write the destination memory locations.
In 64-bit mode, use of the REX.R prefix permits this instruction to access additional
registers (XMM8-XMM15).

Operation

DEST ← SRC;

Intel C/C++ Compiler Intrinsic Equivalent

MOVNTPD void _mm_stream_pd(double *p, __m128d a)

SIMD Floating-Point Exceptions

None.

Opcode

Instruction

64-Bit

Mode

Compat/

Leg Mode

Description

66 0F 2B /r MOVNTPD m128,

xmm

Valid

Valid

Move packed double-precision

floating-point values from xmm to

m128 using non-temporal hint.