Boxis R700 Manuel d'utilisateur télécharger pdf (Page 33)

ATI R700 Technology

Data Sharing 2-15

Figure 2.2 Possible GPR Distribution Between Global, Clause Temps, and

Private Registers

Note that the terms even and odd refer to the ALU execution pipelines to which

the scheduler arbitrarily assigns wavefronts. The first instruction slot to which a

wavefront is assigned wavefront is termed odd.

Both global and clause temp shared registers require that the graphics pipeline

(kernel hardware) must be flushed before changing resource allocation sizes

(number of global registers, number of clause temp registers, etc.) for persistent

shared use. They also require initialization prior to use. After any parallel atomic

accumulation or reductions, the kernel pipeline must be flushed, followed by a

special kernel that uses data sharing between lanes and/or SIMDS for a fast, on-

chip final reduction. The result can be broadcast back to a global persistent

across a subsequent kernel launch as a global src operand. This process can

be very useful for a data collection pass on an image, followed by a reduction

kernel, then followed by a compute kernel that uses the reduced values to alter

the source image. This can be done without CPU intervention or off-chip traffic.

Physically, the GPRs are ordered from zero as: global, clause_temp, private.

Note that this ordering allows a program to use the MOV_INDEX_GLOBAL

instruction to access beyond the global registers into the clause temp registers.

Global shared registers and clause temp registers must fit within the first 128

GPRs, due to ALU-instruction dest-GPR field-size limits.

SIMD-global GPRs are enabled only in the dynamic GPR mode.

2.6.2 Local Data Share (LDS)

Each SIMD has a 16 kB memory space that enables low-latency communication

between threads within a thread group, or the threads within a wavefront. This

memory is configured with four banks, each with 256 entries of 16 bytes. The

write port of the memory uses an owner’s write model, which enables each

thread to write data to private locations. All of the write address logic is provided

in discrete hardware, and the instruction provides the stride per thread and offset

GLB GPR

Pool

Per Wavefront

Pool

ClauseTmp

Even Pool

ClauseTmp

Odd Pool

Private

Clause

Shared

Global

Shared

1 2 ... 28 29 30 31 32 33 34 35 36 37 38 ... 391 392

Commentaires sur ces manuels

Pas de commentaire

Boxis R700 Manuel d'utilisateur Page 33

Commentaires sur ces manuels

Produits connexes et manuels pour Destructeurs de papier Boxis R700