Boxis R700 Manuel d'utilisateur Page 33

  • Télécharger
  • Ajouter à mon manuel
  • Imprimer
  • Page
    / 392
  • Table des matières
  • MARQUE LIVRES
  • Noté. / 5. Basé sur avis des utilisateurs
Vue de la page 32
ATI R700 Technology
Data Sharing 2-15
Copyright © 2009 Advanced Micro Devices, Inc. All rights reserved.
Figure 2.2 Possible GPR Distribution Between Global, Clause Temps, and
Private Registers
Note that the terms even and odd refer to the ALU execution pipelines to which
the scheduler arbitrarily assigns wavefronts. The first instruction slot to which a
wavefront is assigned wavefront is termed odd.
Both global and clause temp shared registers require that the graphics pipeline
(kernel hardware) must be flushed before changing resource allocation sizes
(number of global registers, number of clause temp registers, etc.) for persistent
shared use. They also require initialization prior to use. After any parallel atomic
accumulation or reductions, the kernel pipeline must be flushed, followed by a
special kernel that uses data sharing between lanes and/or SIMDS for a fast, on-
chip final reduction. The result can be broadcast back to a global persistent
register in each register file of each SIMD. The results can be used persistently
across a subsequent kernel launch as a global src operand. This process can
be very useful for a data collection pass on an image, followed by a reduction
kernel, then followed by a compute kernel that uses the reduced values to alter
the source image. This can be done without CPU intervention or off-chip traffic.
Physically, the GPRs are ordered from zero as: global, clause_temp, private.
Note that this ordering allows a program to use the MOV_INDEX_GLOBAL
instruction to access beyond the global registers into the clause temp registers.
Global shared registers and clause temp registers must fit within the first 128
GPRs, due to ALU-instruction dest-GPR field-size limits.
SIMD-global GPRs are enabled only in the dynamic GPR mode.
2.6.2 Local Data Share (LDS)
Each SIMD has a 16 kB memory space that enables low-latency communication
between threads within a thread group, or the threads within a wavefront. This
memory is configured with four banks, each with 256 entries of 16 bytes. The
write port of the memory uses an owner’s write model, which enables each
thread to write data to private locations. All of the write address logic is provided
in discrete hardware, and the instruction provides the stride per thread and offset
GLB GPR
Pool
Per Wavefront
Pool
ClauseTmp
Even Pool
ClauseTmp
Odd Pool
Private
Clause
Shared
Global
Shared
Vue de la page 32
1 2 ... 28 29 30 31 32 33 34 35 36 37 38 ... 391 392

Commentaires sur ces manuels

Pas de commentaire