beautypg.com

Distributed shared i/o, Ccnuma architecture, Cache coherency – Intel SGI Altix 450 User Manual

Page 95

background image

System Features

007-4857-002

73

Memory latency is the amount of time required for a processor to retrieve data from memory.
Memory latency is lowest when a processor accesses local memory.

Distributed Shared I/O

Like DSM, I/O devices are distributed among the blade nodes within the IRUs (each base I/O
blade node has two NUMAlink ports) and are accessible by all compute nodes within the SSI
through the NUMAlink interconnect fabric.

ccNUMA Architecture

As the name implies, the cache-coherent non-uniform memory access (ccNUMA) architecture has
two parts, cache coherency and nonuniform memory access, which are discussed in the sections
that follow.

Cache Coherency

The Altix 450 systems use caches to reduce memory latency. Although data exists in local or
remote memory, copies of the data can exist in various processor caches throughout the system.
Cache coherency keeps the cached copies consistent.

To keep the copies consistent, the ccNUMA architecture uses directory-based coherence protocol.
In directory-based coherence protocol, each block of memory (128 bytes) has an entry in a table
that is referred to as a directory. Like the blocks of memory that they represent, the directories are
distributed among the compute/memory blade nodes. A block of memory is also referred to as a
cache line.

Each directory entry indicates the state of the memory block that it represents. For example, when
the block is not cached, it is in an unowned state. When only one processor has a copy of the
memory block, it is in an exclusive state. And when more than one processor has a copy of the
block, it is in a shared state; a bit vector indicates which caches contain a copy.

When a processor modifies a block of data, the processors that have the same block of data in their
caches must be notified of the modification. The Altix 450 server series use an invalidation
method to maintain cache coherence. The invalidation method purges all unmodified copies of the
block of data, and the processor that wants to modify the block receives exclusive ownership of
the block.