(searched for: doi:10.1145/3352460.3358316)
Journal of Systems Architecture; https://doi.org/10.1016/j.sysarc.2022.102567
APL Photonics, Volume 7; https://doi.org/10.1063/5.0070992
Digital accelerators in the latest generation of complementary metal–oxide–semiconductor processes support, multiply, and accumulate (MAC) operations at energy efficiencies spanning 10–100 fJ/Op. However, the operating speed for such MAC operations is often limited to a few hundreds of MHz. Optical or optoelectronic MAC operations on today’s SOI-based silicon photonic integrated circuit platforms can be realized at a speed of tens of GHz, leading to much lower latency and higher throughput. In this Perspective, we study the energy efficiency of integrated silicon photonic MAC circuits based on Mach–Zehnder modulators and microring resonators. We describe the bounds on energy efficiency and scaling limits for N × N optical networks with today’s technology based on the optical and electrical link budget. We also describe research directions that can overcome the current limitations.
Published: 7 September 2021
Computational Linguistics and Intelligent Text Processing pp 145-156; https://doi.org/10.1007/978-3-030-86340-1_12
Conference: International Conference on Artificial Neural Networks, 14 September 2021 - 17 September 2021, Bratislava, Slovakia
The publisher has not yet granted permission to display this abstract.
Published: 7 September 2020
Proceedings of the 2020 on Great Lakes Symposium on VLSI; https://doi.org/10.1145/3386263.3407581
Conference: GLSVLSI '20: Great Lakes Symposium on VLSI 2020
Processing-in-memory (PIM) architectures have gained significant importance as an alternative paradigm to the von-Neumann architectures to alleviate the memory wall and technology scaling problems. PIM architectures have achieved significant latency and energy consumption improvements for various emerging and widely used workloads such as deep neural networks, graph analytics, databases and computational genomics. In this work, we propose a PIM based accelerator architecture (IMC-Sort) for the sort algorithm. Sort is one of the fundamental and widely used algorithm in various applications such as databases, networking, and data analytics. IMC-Sort architecture augments the hybrid memory cube memory system by incorporating custom sorting network at each of the HMC vault's logic layer. IMC-Sort uses optimized folded Bitonic sort and merge network to sort input sequences of arbitrary length at each vault and optimized address mapping mechanism to distribute the input data across HMC vaults. Merging of the sorted results across individual vaults is also performed using the vault's sorting network by communicating with other vaults through the HMC's crossbar network. Overall, IMC-Sort achieves 16.8x, 1.1x speedup and 375.5x, 13.6x savings in energy consumption compared to the widely used CPU implementation and state of the art near memory custom sort accelerator respectively.