VeGen: a vectorizer generator for SIMD and beyond

17 April 2021

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

https://doi.org/10.1145/3445814.3446692

Abstract

Vector instructions are ubiquitous in modern processors. Traditional compiler auto-vectorization techniques have focused on targeting single instruction multiple data (SIMD) instructions. However, these auto-vectorization techniques are not sufficiently powerful to model non-SIMD vector instructions, which can accelerate applications in domains such as image processing, digital signal processing, and machine learning. To target non-SIMD instruction, compiler developers have resorted to complicated, ad hoc peephole optimizations, expending significant development time while still coming up short. As vector instruction sets continue to rapidly evolve, compilers cannot keep up with these new hardware capabilities. In this paper, we introduce Lane Level Parallelism (LLP), which captures the model of parallelism implemented by both SIMD and non-SIMD vector instructions. We present VeGen, a vectorizer generator that automatically generates a vectorization pass to uncover target-architecture-specific LLP in programs while using only instruction semantics as input. VeGen decouples, yet coordinates automatically generated target-specific vectorization utilities with its target-independent vectorization algorithm. This design enables us to systematically target non-SIMD vector instructions that until now require ad hoc coordination between different compiler stages. We show that VeGen can use non-SIMD vector instructions effectively, for example, getting speedup 3× (compared to LLVM’s vectorizer) on x265’s idct4 kernel.

Keywords

Funding Information

Defense Advanced Research Projects Agency (HR0011-18-3-0007)
U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (DESC0018121)

This publication has 21 references indexed in Scilit:

Stochastic superoptimization
Published by Association for Computing Machinery (ACM) ,2013
From relational verification to SIMD loop synthesis
Published by Association for Computing Machinery (ACM) ,2013
A compiler framework for extracting superword level parallelism
Published by Association for Computing Machinery (ACM) ,2012
Automatic generation of peephole superoptimizers
Published by Association for Computing Machinery (ACM) ,2006
Auto-vectorization of interleaved data for SIMD
Published by Association for Computing Machinery (ACM) ,2006
Vectorization for SIMD architectures with alignment constraints
Published by Association for Computing Machinery (ACM) ,2004
Specifying representations of machine instructions
ACM Transactions on Programming Languages and Systems, 1997
Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems, 1987
Retargetable Compiler Code Generation
ACM Computing Surveys, 1982
Automatic Derivation of Code Generators from Machine Descriptions
ACM Transactions on Programming Languages and Systems, 1980

Cited by 21 articles