Plasticine

24 June 2017

journal article
conference paper
Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News

Vol. 45 (2), 389-402
https://doi.org/10.1145/3140659.3080256

Abstract

Reconfigurable architectures have gained popularity in recent years as they allow the design of energy-efficient accelerators. Fine-grain fabrics (e.g. FPGAs) have traditionally suffered from performance and power inefficiencies due to bit-level reconfigurable abstractions. Both fine-grain and coarse-grain architectures (e.g. CGRAs) traditionally require low level programming and suffer from long compilation times. We address both challenges with Plasticine, a new spatially reconfigurable architecture designed to efficiently execute applications composed of parallel patterns. Parallel patterns have emerged from recent research on parallel programming as powerful, high-level abstractions that can elegantly capture data locality, memory access patterns, and parallelism across a wide range of dense and sparse applications. We motivate Plasticine by first observing key application characteristics captured by parallel patterns that are amenable to hardware acceleration, such as hierarchical parallelism, data locality, memory access patterns, and control flow. Based on these observations, we architect Plasticine as a collection of Pattern Compute Units and Pattern Memory Units. Pattern Compute Units are multi-stage pipelines of reconfigurable SIMD functional units that can efficiently execute nested patterns. Data locality is exploited in Pattern Memory Units using banked scratchpad memories and configurable address decoders. Multiple on-chip address generators and scatter-gather engines make efficient use of DRAM bandwidth by supporting a large number of outstanding memory requests, memory coalescing, and burst mode for dense accesses. Plasticine has an area footprint of 113 mm2 in a 28nm process, and consumes a maximum power of 49 W at a 1 GHz clock. Using a cycle-accurate simulator, we demonstrate that Plasticine provides an improvement of up to 76.9x in performance-per-Watt over a conventional FPGA over a wide range of dense and sparse applications.

Keywords

Funding Information

National Science Foundation (IIS-1247701, CCF-1111943, CCF-1337375, and SHF-1408911)
Stanford PPL affiliates program, Pervasive Parallelism Lab: Oracle, AMD, Huawei, Intel, NVIDIA, SAP Labs
Army Contract AHPCRC (W911NF-07-2-0027-1)
DARPA Contract-Air Force (FA8750-12-2-0335)

This publication has 36 references indexed in Scilit:

Delite
ACM Transactions on Embedded Computing Systems, 2014
GPUWattch
Published by Association for Computing Machinery (ACM) ,2013
Halide
Published by Association for Computing Machinery (ACM) ,2013
Composition and Reuse with Compiled Domain-Specific Languages
Lecture Notes in Computer Science, 2013
Chisel
Published by Association for Computing Machinery (ACM) ,2012
FPGA Architecture: Survey and Challenges
Foundations and Trends® in Electronic Design Automation, 2008
A detailed power model for field-programmable gate arrays
ACM Transactions on Design Automation of Electronic Systems, 2005
ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix
Lecture Notes in Computer Science, 2003
The Raw microprocessor: a computational fabric for software circuits and general-purpose programs
IEEE Micro, 2002
The Garp architecture and C compiler
Computer, 2000

Cited by 13 articles