Refine Search

New Search

Results: 7

(searched for: doi:10.1145/3297858.3304033)
Save to Scifeed
Page of 1
Articles per Page
by
Show export options
  Select all
Robert Guirado, Hyoukjun Kwon, Sergi Abadal, Eduard Alarcón, Tushar Krishna
Proceedings of the 26th Asia and South Pacific Design Automation Conference; https://doi.org/10.1145/3394885.3431537

Abstract:
Deep neural network (DNN) models continue to grow in size and complexity, demanding higher computational power to enable real-time inference. To efficiently deliver such computational demands, hardware accelerators are being developed and deployed across scales. This naturally requires an efficient scale-out mechanism for increasing compute density as required by the application. 2.5D integration over interposer has emerged as a promising solution, but as we show in this work, the limited interposer bandwidth and multiple hops in the Network-on-Package (NoP) can diminish the benefits of the approach. To cope with this challenge, we propose WIENNA, a wireless NoP-based 2.5D DNN accelerator. In WIENNA, the wireless NoP connects an array of DNN accelerator chiplets to the global buffer chiplet, providing high-bandwidth multicasting capabilities. Here, we also identify the dataflow style that most efficienty exploits the wireless NoP's high-bandwidth multicasting capability on each layer. With modest area and power overheads, WIENNA achieves 2.2X-5.1X higher throughput and 38.2% lower energy than an interposer-based NoP design.
Suraj Jog, Zikun Liu, Antonio Franques, Vimuth Fernando, Haitham Hassanieh, Sergi Abadal, Josep Torrellas
Proceedings of the SIGCOMM '20 Poster and Demo Sessions; https://doi.org/10.1145/3405837.3411396

Abstract:
Wireless Network-on-Chip (NoC) has emerged as a promising solution to scale chip multi-core processors to hundreds of cores. However, traditional medium access protocols fall short here since the traffic patterns on wireless NoCs tend to be very dynamic and can change drastically across different cores, different time intervals and different applications. In this work, we present NeuMAC, a unified approach that combines networking, architecture and AI to generate highly adaptive medium access protocols that can learn and optimize for the structure, correlations and statistics of the traffic patterns on the NoC. Our results show that NeuMAC can quickly adapt to NoC traffic to provide significant gains in terms of latency and overall execution time, improving the execution time by up to 1.69X - 3.74X.
Giuseppe Ascia, Vincenzo Catania, , Maurizio Palesi, Davide Patti, , Valerio Mario Salerno
ACM Journal on Emerging Technologies in Computing Systems, Volume 16, pp 1-27; https://doi.org/10.1145/3379448

Abstract:
The emerging wireless Network-on-Chip (WiNoC) architectures are a viable solution for addressing the scalability limitations of manycore architectures in which multi-hop long-range communications strongly impact both the performance and energy figures of the system. The energy consumption of wired links as well as that of radio communications account for a relevant fraction of the overall energy budget. In this article, we extend the approximate computing paradigm to the case of the on-chip communication system in manycore architectures. We present techniques, circuitries, and programming interfaces aimed at reducing the energy consumption of a WiNoC by exploiting the trade-off energy saving vs. application output degradation. The proposed platform—namely, xWiNoC—uses variable voltage swing links and tunable transmitting power wireless interfaces along with a programming interface that allows the programmer to specify those data structures that are error-resilient. Thus, communications induced by the access to such error-resilient data structures are carried out by using links and radio channels that are configured to work in a low energy mode, albeit by exposing a higher bit error rate. xWiNoC is assessed on a set of applications belonging to different domains in which the trade-off energy vs. performance vs. application result quality is discussed. We found that up to 50% of communication energy saving can be obtained with a negligible impact on the application output quality and 3% in application performance degradation.
Keyur Joshi, Vimuth Fernando, Sasa Misailovic
Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization; https://doi.org/10.1145/3368826.3377924

Abstract:
Modern hardware is becoming increasingly susceptible to silent data corruptions. As general methods for detection and recovery from errors are time and energy consuming, selective detection and recovery are promising alternatives for applications that have the freedom to produce results with a variable level of accuracy. Several programming languages have provided specialized constructs for expressing detection and recovery operations, but the existing static analyses of safety and quantitative analyses of programs do not have the proper support for such language constructs. This work presents Aloe, a quantitative static analysis of reliability of programs with recovery blocks - a construct that checks for errors, and if necessary, applies the corresponding recovery strategy. The analysis supports reasoning about both reliable and potentially unreliable detection and recovery mechanisms. It implements a novel precondition generator for recovery blocks, built on top of Rely, a state-of-the-art quantitative reliability analysis for imperative programs. Aloe can reason about programs with scalar and array expressions, if-then-else conditionals, and bounded loops without early exits. The analyzed computation is idempotent and the recovery code re-executes the original computation. We implemented Aloe and applied it to a set of eight programs previously used in approximate computing research. Our results present significantly higher reliability and scale better compared to the existing Rely analysis. Moreover, the end-to-end accuracy of the verified computations exhibits only small accuracy losses.
Vimuth Fernando, Keyur Joshi, Sasa Misailovic
Proceedings of the ACM on Programming Languages, Volume 3, pp 1-29; https://doi.org/10.1145/3360545

Abstract:
We present Parallely, a programming language and a system for verification of approximations in parallel message-passing programs. Parallely's language can express various software and hardware level approximations that reduce the computation and communication overheads at the cost of result accuracy. Parallely's safety analysis can prove the absence of deadlocks in approximate computations and its type system can ensure that approximate values do not interfere with precise values. Parallely's quantitative accuracy analysis can reason about the frequency and magnitude of error. To support such analyses, Parallely presents an approximation-aware version of canonical sequentialization, a recently proposed verification technique that generates sequential programs that capture the semantics of well-structured parallel programs (i.e., ones that satisfy a symmetric nondeterminism property). To the best of our knowledge, Parallely is the first system designed to analyze parallel approximate programs. We demonstrate the effectiveness of Parallely on eight benchmark applications from the domains of graph analytics, image processing, and numerical analysis. We also encode and study five approximation mechanisms from literature. Our implementation of Parallely automatically and efficiently proves type safety, reliability, and accuracy properties of the approximate benchmarks.
Page of 1
Articles per Page
by
Show export options
  Select all
Back to Top Top