Understanding sources of inefficiency in general-purpose chips
- 19 June 2010
- journal article
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 38 (3), 37-47
- https://doi.org/10.1145/1816038.1815968
Abstract
Due to their high volume, general-purpose processors, and now chip multiprocessors (CMPs), are much more cost effective than ASICs, but lag significantly in terms of performance and energy efficiency. This paper explores the sources of these performance and energy overheads in general-purpose processing systems by quantifying the overheads of a 720p HD H.264 encoder running on a general-purpose CMP system. It then explores methods to eliminate these overheads by transforming the CPU into a specialized system for H.264 encoding. We evaluate the gains from customizations useful to broad classes of algorithms, such as SIMD units, as well as those specific to particular computation, such as customized storage and functional units. The ASIC is 500x more energy efficient than our original four-processor CMP. Broadly applicable optimizations improve performance by 10x and energy by 7x. However, the very low energy costs of actual core ops (100s fJ in 90nm) mean that over 90% of the energy used in these solutions is still "overhead". Achieving ASIC-like performance and efficiency requires algorithm-specific optimizations. For each sub-algorithm of H.264, we create a large, specialized functional unit that is capable of executing 100s of operations per instruction. This improves performance and energy by an additional 25x and the final customized CMP matches an ASIC solution's performance within 3x of its energy and within comparable area.Keywords
This publication has 20 references indexed in Scilit:
- AnySPACM SIGARCH Computer Architecture News, 2009
- Optimization of sparse matrix-vector multiplication on emerging multicore platformsPublished by Association for Computing Machinery (ACM) ,2007
- A 7mW-to-183mW Dynamic Quality-Scalable H.264 Video Encoder ChipPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoderIEEE Transactions on Circuits and Systems for Video Technology, 2006
- Implementation of H.264 encoder and decoder on personal computersJournal of Visual Communication and Image Representation, 2006
- Automated Custom Instruction Generation for Domain-Specific Processor AccelerationInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005
- Flexible architectures for engineering successful SOCsPublished by Association for Computing Machinery (ACM) ,2004
- Application-specific instruction generation for configurable processor architecturesPublished by Association for Computing Machinery (ACM) ,2004
- Overview of the H.264/AVC video coding standardIEEE Transactions on Circuits and Systems for Video Technology, 2003
- A design environment for high-throughput low-power dedicated signal processing systemsIEEE Journal of Solid-State Circuits, 2002