Efficient SIMD Code Generation for Runtime Alignment and Length Conversion
- 31 March 2005
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in International Symposium on Code Generation and Optimization
- p. 153-164
- https://doi.org/10.1109/cgo.2005.18
Abstract
When generating codes for today's multimedia extensions, one of the major challenges is to deal with memory alignment issues. While hand programming still yields best performing SIMD codes, it is both time consuming and error prone. Compiler technology has greatly improved, including techniques that simdize loops with misaligned accesses by automatically rearranging misaligned memory streams in registers. Current techniques are applicable to runtime alignments, but they aggressively reduce the alignment overhead only when all alignments are known at compile time. This paper presents two major enhancements to the state of the art, improving both performance and coverage. First, we propose a novel technique to simdize loops with runtime alignment nearly as efficiently as those with compile-time misalignment. Runtime alignment is pervasive in real applications because it is either part of the algorithms, or it is an artifact of the compiler's inability to extract accurate alignment information from complex applications. Second, we incorporate length conversion operations, e.g., conversions between data of different sizes, into the alignment handling framework. Length conversions are pervasive in multimedia applications where mixed integer types are often used. Supporting length conversion can greatly improve the coverage of simdizable loops. Experimental results indicate that our runtime alignment technique achieves a 19% to 32% speedup increase over prior art for a benchmark stressing the impact of misaligned data. We also demonstrate speedup factors of up to 8.11 for real benchmarks over sequential execution.Keywords
This publication has 9 references indexed in Scilit:
- Vectorization for SIMD architectures with alignment constraintsPublished by Association for Computing Machinery (ACM) ,2004
- Vectorizing for a SIMdD DSP architecturePublished by Association for Computing Machinery (ACM) ,2003
- Increasing and detecting memory address congruencePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Simple vector microprocessors for multimedia applicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Automatic Intra-Register Vectorization for the Intel® ArchitectureInternational Journal of Parallel Programming, 2002
- Exploiting superword level parallelism with multimedia instruction setsACM SIGPLAN Notices, 2000
- A Vectorizing Compiler for Multimedia ExtensionsInternational Journal of Parallel Programming, 2000
- Compilation Techniques for Multimedia ProcessorsInternational Journal of Parallel Programming, 2000
- Automatic translation of FORTRAN programs to vector formACM Transactions on Programming Languages and Systems, 1987