Intel IA-32 User Manual
Page 506
12-8 Vol. 3A
SSE, SSE2 AND SSE3 SYSTEM PROGRAMMING
•
The operating system can take the responsibility for automatically saving the x87 FPU,
MMX, XXM, and MXCSR registers as part of the task switch process (using an FXSAVE
instruction) and automatically restoring the state of the registers when a suspended task is
resumed (using an FXRSTOR instruction). Here, the x87 FPU/MMX/SSE/SSE2/SSE3
state must be saved as part of the task state. This approach is appropriate for preemptive
multitasking operating systems, where the application cannot know when it is going to be
preempted and cannot prepare in advance for task switching. Here, the operating system is
responsible for saving and restoring the task and the x87 FPU/MMX/SSE/SSE2/SSE3
state when necessary.
•
The operating system can take the responsibility for saving the x87 FPU, MMX, XXM,
and MXCSR registers as part of the task switch process, but delay the saving of the MMX
and x87 FPU state until an x87 FPU, MMX, or SSE/SSE2/SSE3 instruction is actually
executed by the new task. Using this approach, the x87 FPU/MMX/SSE/SSE2/SSE3 state
is saved only if an x87 FPU/MMX/SSE/SSE2/SSE3 instruction needs to be executed in the
new task. (See Section 12.5.1., “Using the TS Flag to Control the Saving of the x87 FPU,
MMX, SSE, SSE2 and SSE3 State,” for more information.)
12.5.1.
Using the TS Flag to Control the Saving of the
x87 FPU, MMX, SSE, SSE2 and SSE3 State
Saving the x87 FPU/MMX/SSE/SSE2/SSE3 state using FXSAVE requires processor overhead.
If the new task does not access x87 FPU, MMX, XXM, and MXCSR registers, avoid overhead
by not automatically saving the state on a task switch.
The TS flag in control register CR0 is provided to allow the operating system to delay saving
the x87 FPU/MMX/SSE/SSE2/SSE3 state until an instruction that actually accesses this state is
encountered in a new task. When the TS flag is set, the processor monitors the instruction stream
for an x87 FPU/MMX/SSE/SSE2/SSE3 instruction. When the processor detects one of these
instructions, it raises a device-not-available exception (#NM) prior to executing the instruction.
The device-not-available exception handler can then be used to save the x87
FPU/MMX/SSE/SSE2/SSE3 state for the previous task (using an FXSAVE instruction) and load
the x87 FPU/MMX/SSE/SSE2/SSE3 state for the current task (using an FXRSTOR instruction).
If the task never encounters an x87 FPU/MMX/SSE/SSE2/SSE3 instruction, the device-not-
available exception will not be raised and a task state will not be saved unnecessarily.
The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0)
or implicitly (using the IA-32 architecture’s native task switching mechanism). When the native
task switching mechanism is used, the processor automatically sets the TS flag on a task switch.
After the device-not-available handler has saved the x87 FPU/MMX/SSE/SSE2/SSE3 state, it
should execute the CLTS instruction to clear the TS flag.
Figure 12-1 gives an example of an operating system that implements x87
FPU/MMX/SSE/SSE2/SSE3 state saving using the TS flag. In this example, task A is the
currently running task and task B is the new task. The operating system maintains a save area
for the x87 FPU/MMX/SSE/SSE2/SSE3 state for each task and defines a variable
(x87_MMX_SSE_SSE2_SSE3_StateOwner) that indicates the task that “owns” the state. In this
example, task A is the current owner.