beautypg.com

Intel 253666-024US User Manual

Page 703

background image

Vol. 2A 3-657

INSTRUCTION SET REFERENCE, A-M

MOVNTPS—Store Packed Single-Precision Floating-Point Values Using Non-Temporal

Hint

MOVNTPS—Store Packed Single-Precision Floating-Point Values Using

Non-Temporal Hint

Description

Moves the double quadword in the source operand (second operand) to the destina-
tion operand (first operand) using a non-temporal hint to minimize cache pollution
during the write to memory. The source operand is an XMM register, which is
assumed to contain four packed single-precision floating-point values. The destina-
tion operand is a 128-bit memory location.
The non-temporal hint is implemented by using a write combining (WC) memory
type protocol when writing the data to memory. Using this protocol, the processor
does not write the data into the cache hierarchy, nor does it fetch the corresponding
cache line from memory into the cache hierarchy. The memory type of the region
being written to can override the non-temporal hint, if the memory address specified
for the non-temporal store is in an uncacheable (UC) or write protected (WP)
memory region. For more information on non-temporal stores, see “Caching of
Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architec-
tures Software Developer’s Manual, Volume 1
.
Because the WC protocol uses a weakly-ordered memory consistency model, a
fencing operation implemented with the SFENCE or MFENCE instruction should be
used in conjunction with MOVNTPS instructions if multiple processors might use
different memory types to read/write the destination memory locations.
In 64-bit mode, use of the REX.R prefix permits this instruction to access additional
registers (XMM8-XMM15).

Operation

DEST ← SRC;

Intel C/C++ Compiler Intrinsic Equivalent

MOVNTDQ

void _mm_stream_ps(float * p, __m128 a)

SIMD Floating-Point Exceptions

None.

Opcode

Instruction

64-Bit

Mode

Compat/

Leg Mode

Description

0F 2B /r

MOVNTPS m128,

xmm

Valid

Valid

Move packed single-precision floating-

point values from xmm to m128 using

non-temporal hint.