LogCA

24 June 2017

journal article
conference paper
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 45 (2), 375-388
https://doi.org/10.1145/3140659.3080216

Abstract

With the end of Dennard scaling, architects have increasingly turned to special-purpose hardware accelerators to improve the performance and energy efficiency for some applications. Unfortunately, accelerators don't always live up to their expectations and may under-perform in some situations. Understanding the factors which effect the performance of an accelerator is crucial for both architects and programmers early in the design stage. Detailed models can be highly accurate, but often require low-level details which are not available until late in the design cycle. In contrast, simple analytical models can provide useful insights by abstracting away low-level system details. In this paper, we propose LogCA---a high-level performance model for hardware accelerators. LogCA helps both programmers and architects identify performance bounds and design bottlenecks early in the design cycle, and provide insight into which optimizations may alleviate these bottlenecks. We validate our model across a variety of kernels, ranging from sub-linear to super-linear complexities on both on-chip and off-chip accelerators. We also describe the utility of LogCA using two retrospective case studies. First, we discuss the evolution of interface design in SUN/Oracle's encryption accelerators. Second, we discuss the evolution of memory interface design in three different GPU architectures. In both cases, we show that the adopted design optimizations for these machines are similar to LogCA's suggested optimizations. We argue that architects and programmers can use insights from these retrospective studies for improving future designs.

Keywords

Funding Information

National Science Foundation (CNS-1302260, CCF-1438992, CCF-1533885, CCF- 1617824)

This publication has 39 references indexed in Scilit:

DASX
Published by Association for Computing Machinery (ACM) ,2015
Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study
The Journal of Supercomputing, 2014
Performance Modeling for FPGAs: Extending the Roofline Model with High-Level Synthesis Tools
International Journal of Reconfigurable Computing, 2013
Modeling and predicting performance of high performance computing applications on hardware accelerators
The International Journal of High Performance Computing Applications, 2012
A Survey of Computation Offloading for Mobile Systems
Mobile Networks and Applications, 2012
GPURoofline: A Model for Guiding Performance Optimizations on GPUs
Lecture Notes in Computer Science, 2012
Conservation cores
ACM SIGARCH Computer Architecture News, 2010
A view of the parallel computing landscape
Communications of the ACM, 2009
Roofline
Communications of the ACM, 2009
Validity of the single processor approach to achieving large scale computing capabilities
Published by Association for Computing Machinery (ACM) ,1967

Cited by 8 articles