Polyhedral parallel code generation for CUDA

Top Cited Papers

20 January 2013

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Transactions on Architecture and Code Optimization

Vol. 9 (4), 1-23
https://doi.org/10.1145/2400682.2400713

Abstract

This article addresses the compilation of a sequential program for parallel execution on a modern GPU. To this end, we present a novel source-to-source compiler called PPCG. PPCG singles out for its ability to accelerate computations from any static control loop nest, generating multiple CUDA kernels when necessary. We introduce a multilevel tiling strategy and a code generation scheme for the parallelization and locality optimization of imperfectly nested loops, managing memory and exposing concurrency according to the constraints of modern GPUs. We evaluate our algorithms and tool on the entire PolyBench suite.

Keywords

This publication has 28 references indexed in Scilit:

Automatic Restructuring of GPU Kernels for Exploiting Inter-thread Data Locality
Lecture Notes in Computer Science, 2012
Direct Numerical Simulation and PIV Measurement of Turbulent Boundary Layer over a Rod-Roughened Wall
Iutam Bookseries, 2010
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes
Lecture Notes in Computer Science, 2009
A practical automatic polyhedral parallelizer and locality optimizer
ACM SIGPLAN Notices, 2008
CUDA-Lite: Reducing GPU Programming Complexity
Lecture Notes in Computer Science, 2008
Minimizing development and maintenance costs in supporting persistently optimized BLAS
Software: Practice and Experience, 2005
Loop parallelization algorithms: From parallelism extraction to code generation
Parallel Computing, 1998
Some efficient solutions to the affine scheduling problem. I. One-dimensional time
International Journal of Parallel Programming, 1992
Dataflow analysis of array and scalar references
International Journal of Parallel Programming, 1991
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems, 1987

Cited by 229 articles