Plasticine
- 24 June 2017
- journal article
- conference paper
- Published by Association for Computing Machinery (ACM) in ACM SIGARCH Computer Architecture News
- Vol. 45 (2), 389-402
- https://doi.org/10.1145/3140659.3080256
Abstract
Reconfigurable architectures have gained popularity in recent years as they allow the design of energy-efficient accelerators. Fine-grain fabrics (e.g. FPGAs) have traditionally suffered from performance and power inefficiencies due to bit-level reconfigurable abstractions. Both fine-grain and coarse-grain architectures (e.g. CGRAs) traditionally require low level programming and suffer from long compilation times. We address both challenges with Plasticine, a new spatially reconfigurable architecture designed to efficiently execute applications composed of parallel patterns. Parallel patterns have emerged from recent research on parallel programming as powerful, high-level abstractions that can elegantly capture data locality, memory access patterns, and parallelism across a wide range of dense and sparse applications. We motivate Plasticine by first observing key application characteristics captured by parallel patterns that are amenable to hardware acceleration, such as hierarchical parallelism, data locality, memory access patterns, and control flow. Based on these observations, we architect Plasticine as a collection of Pattern Compute Units and Pattern Memory Units. Pattern Compute Units are multi-stage pipelines of reconfigurable SIMD functional units that can efficiently execute nested patterns. Data locality is exploited in Pattern Memory Units using banked scratchpad memories and configurable address decoders. Multiple on-chip address generators and scatter-gather engines make efficient use of DRAM bandwidth by supporting a large number of outstanding memory requests, memory coalescing, and burst mode for dense accesses. Plasticine has an area footprint of 113 mm2 in a 28nm process, and consumes a maximum power of 49 W at a 1 GHz clock. Using a cycle-accurate simulator, we demonstrate that Plasticine provides an improvement of up to 76.9x in performance-per-Watt over a conventional FPGA over a wide range of dense and sparse applications.Keywords
Funding Information
- National Science Foundation (IIS-1247701, CCF-1111943, CCF-1337375, and SHF-1408911)
- Stanford PPL affiliates program, Pervasive Parallelism Lab: Oracle, AMD, Huawei, Intel, NVIDIA, SAP Labs
- Army Contract AHPCRC (W911NF-07-2-0027-1)
- DARPA Contract-Air Force (FA8750-12-2-0335)
This publication has 36 references indexed in Scilit:
- DeliteACM Transactions on Embedded Computing Systems, 2014
- GPUWattchPublished by Association for Computing Machinery (ACM) ,2013
- HalidePublished by Association for Computing Machinery (ACM) ,2013
- Composition and Reuse with Compiled Domain-Specific LanguagesLecture Notes in Computer Science, 2013
- ChiselPublished by Association for Computing Machinery (ACM) ,2012
- FPGA Architecture: Survey and ChallengesFoundations and Trends® in Electronic Design Automation, 2008
- A detailed power model for field-programmable gate arraysACM Transactions on Design Automation of Electronic Systems, 2005
- ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable MatrixLecture Notes in Computer Science, 2003
- The Raw microprocessor: a computational fabric for software circuits and general-purpose programsIEEE Micro, 2002
- The Garp architecture and C compilerComputer, 2000