Intel ARCHITECTURE IA-32 User Manual
Intel Computer Accessories
Table of contents
Document Outline
- IA-32 Intel® Architecture Optimization Reference Manual
- Disclaimer
- Contents
- Introduction
- 1 IA-32 Intel® Architecture Processor Family Overview
- SIMD Technology
- Intel® Extended Memory 64 Technology (Intel® EM64T)
- Intel NetBurst® Microarchitecture
- Intel® Pentium® M Processor Microarchitecture
- Microarchitecture of Intel® Core™ Solo and Intel® Core™ Duo Processors
- Hyper-Threading Technology
- Multi-Core Processors
- 2 General Optimization Guidelines
- Tuning to Achieve Optimum Performance
- Tuning to Prevent Known Coding Pitfalls
- General Practices and Coding Guidelines
- Coding Rules, Suggestions and Tuning Hints
- Performance Tools
- Processor Perspectives
- Branch Prediction
- Memory Accesses
- Improving the Performance of Floating-point Applications
- Instruction Selection
- Complex Instructions
- Use of the lea Instruction
- Use of the inc and dec Instructions
- Use of the shift and rotate Instructions
- Flag Register Accesses
- Integer Divide
- Operand Sizes and Partial Register Accesses
- Prefixes and Instruction Decoding
- REP Prefix and Data Movement
- Address Calculations
- Clearing Registers
- Compares
- Floating Point/SIMD Operands
- Prolog Sequences
- Code Sequences that Operate on Memory Operands
- Instruction Scheduling
- Vectorization
- Miscellaneous
- Summary of Rules and Suggestions
- 3 Coding for SIMD Architectures
- 4 Optimizing for SIMD Integer Applications
- General Rules on SIMD Integer Code
- Using SIMD Integer with x87 Floating-point
- Data Alignment
- Data Movement Coding Techniques
- Unsigned Unpack
- Signed Unpack
- Interleaved Pack with Saturation
- Interleaved Pack without Saturation
- Non-Interleaved Unpack
- Extract Word
- Insert Word
- Move Byte Mask to Integer
- Packed Shuffle Word for 64-bit Registers
- Packed Shuffle Word for 128-bit Registers
- Unpacking/interleaving 64-bit Data in 128-bit Registers
- Data Movement
- Conversion Instructions
- Generating Constants
- Building Blocks
- Absolute Difference of Unsigned Numbers
- Absolute Difference of Signed Numbers
- Absolute Value
- Clipping to an Arbitrary Range [high, low]
- Packed Max/Min of Signed Word and Unsigned Byte
- Packed Multiply High Unsigned
- Packed Sum of Absolute Differences
- Packed Average (Byte/Word)
- Complex Multiply by a Constant
- Packed 32*32 Multiply
- Packed 64-bit Add/Subtract
- 128-bit Shifts
- Memory Optimizations
- Converting from 64-bit to 128-bit SIMD Integer
- 5 Optimizing for SIMD Floating-point Applications
- 6 Optimizing Cache Usage
- General Prefetch Coding Guidelines
- Hardware Prefetching of Data
- Prefetch and Cacheability Instructions
- Prefetch
- Cacheability Control
- Memory Optimization Using Prefetch
- Software-controlled Prefetch
- Hardware Prefetch
- Example of Effective Latency Reduction with H/W Prefetch
- Example of Latency Hiding with S/W Prefetch Instruction
- Software Prefetching Usage Checklist
- Software Prefetch Scheduling Distance
- Software Prefetch Concatenation
- Minimize Number of Software Prefetches
- Mix Software Prefetch with Computation Instructions
- Software Prefetch and Cache Blocking Techniques
- Hardware Prefetching and Cache Blocking Techniques
- Single-pass versus Multi-pass Execution
- Memory Optimization using Non-Temporal Stores
- Deterministic Cache Parameters
- 7 Multi-Core and Hyper-Threading Technology
- 8 64-bit Mode Coding Guidelines
- 9 Power Optimization for Mobile Usages
- Overview
- Mobile Usage Scenarios
- ACPI C-States
- Guidelines for Extending Battery Life
- A Application Performance Tools
- Intel® Compilers
- Intel® VTune™ Performance Analyzer
- Intel® Performance Libraries
- Enhanced Debugger (EDB)
- Intel® Threading Tools
- Intel® Software College
- B Using Performance Monitoring Events
- Pentium 4 Processor Performance Metrics
- Pentium 4 Processor-Specific Terminology
- Counting Clocks
- Microarchitecture Notes
- Metrics Descriptions and Categories
- Performance Metrics and Tagging Mechanisms
- Using Performance Metrics with Hyper-Threading Technology
- Using Performance Events of Intel Core Solo and Intel Core Duo processors
- C IA-32 Instruction Latency and Throughput
- D Stack Alignment
- E Mathematics of Prefetch Scheduling Distance
- Index
- Intel Sales Offices