More iteration space tiling

Abstract

Subdividing the iteration space of a loop into blocks or tiles with a fixed maximum size has several advantages. Tiles become a natural candidate as the unit of work for parallel task scheduling. Synchronization between processors can be done between tiles, reducing synchronization frequency (at some loss of potential parallelism). The shape and size of a tile can be optimized to take advantage of memory locality for memory hierarchy utilization. Vectorization and register locality naturally fits into the optimization within a tile, while parallelization and cache locality fits into optimization between tiles.

Keywords

FIXED MAXIMUM SIZE
NATURAL CANDIDATE
POTENTIAL PARALLELISM
PARALLEL TASK SCHEDULING
PARALLELIZATION
DATA DEPENDENCE
MEMORY HIERARCHY OPTIMIZATION
SYNCHRONIZATION FREQUENCY
ITERATION SPACE TILING
CACHE LOCALITY
MEMORY LOCALITY
ITERATION SPACE
MEMORY HIERARCHY UTILIZATION
DATA MINING
REGISTERS
SHAPE

Cited by 251 articles