ACM Transactions on Design Automation of Electronic Systems
ISSN / EISSN : 1084-4309 / 1557-7309
Published by: Association for Computing Machinery (ACM) (10.1145)
Total articles ≅ 1,208
Latest articles in this journal
Published: 1 August 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-20; https://doi.org/10.1145/3460972
In this article, we present a low-energy inference method for convolutional neural networks in image classification applications. The lower energy consumption is achieved by using a highly pruned (lower-energy) network if the resulting network can provide a correct output. More specifically, the proposed inference method makes use of two pruned neural networks (NNs), namely mildly and aggressively pruned networks, which are both designed offline. In the system, a third NN makes use of the input data for the online selection of the appropriate pruned network. The third network, for its feature extraction, employs the same convolutional layers as those of the aggressively pruned NN, thereby reducing the overhead of the online management. There is some accuracy loss induced by the proposed method where, for a given level of accuracy, the energy gain of the proposed method is considerably larger than the case of employing any one pruning level. The proposed method is independent of both the pruning method and the network architecture. The efficacy of the proposed inference method is assessed on Eyeriss hardware accelerator platform for some of the state-of-the-art NN architectures. Our studies show that this method may provide, on average, 70% energy reduction compared to the original NN at the cost of about 3% accuracy loss on the CIFAR-10 dataset.
Published: 1 August 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-17; https://doi.org/10.1145/3460289
Single flux quantum (SFQ) logic is a promising technology to replace complementary metal-oxide-semiconductor logic for future exa-scale supercomputing but requires the development of reliable EDA tools that are tailored to the unique characteristics of SFQ circuits, including the need for active splitters to support fanout and clocked logic gates. This article is the first work to present a physical design methodology for inserting hold buffers in SFQ circuits. Our approach is variation-aware, uses common path pessimism removal and incremental placement to minimize the overhead of timing fixes, and can trade off layout area and timing yield. Compared to a previously proposed approach using fixed hold time margins, Monte Carlo simulations show that, averaging across 10 ISCAS’85 benchmark circuits, our proposed method can reduce the number of inserted hold buffers by 8.4% with a 6.2% improvement in timing yield and by 21.9% with a 1.7% improvement in timing yield.
Published: 30 July 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-36; https://doi.org/10.1145/3460437
Digital Microfluidics is an emerging technology for automating laboratory procedures in biochemistry. With more and more complex biochemical protocols getting mapped to biochip devices and microfluidics receiving a wide adoption, it is becoming indispensable to develop automated tools and synthesis platforms that can enable a smooth transformation from complex cumbersome benchtop laboratory procedures to biochip execution. Given an informal/semi-formal assay description and a target microfluidic grid architecture on which the assay has to be implemented, a synthesis tool typically translates the high-level assay operations to low-level actuation sequences that can drive the assay realization on the grid. With more and more complex biochemical assay protocols being taken up for synthesis and biochips supporting a wider variety of operations (e.g., MicroElectrode Dot Arrays (MEDAs)), the task of assay synthesis is getting intricately complex. Errors in the synthesized assay descriptions may have undesirable consequences in assay operations, leading to unacceptable outcomes after execution on the biochips. In this work, we focus on the challenge of examining the correctness of synthesized protocol descriptions, before they are taken up for realization on a microfluidic biochip. In particular, we take up a protocol description synthesized for a MEDA biochip and adopt a formal analysis method to derive correctness proofs or a violation thereof, pointing to the exact operation in the erroneous translation. We present experimental results on a few bioassay protocols and show the utility of our framework for verifiable protocol synthesis.
Published: 28 June 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-12; https://doi.org/10.1145/3462171
As the demand of safety-critical applications (e.g., automobile electronics) increases, various radiation-hardened flip-flops are proposed for enhancing design reliability. Among all flip-flops, Delay-Adjustable D-Flip-Flop (DAD-FF) is specialized in arbitrarily adjusting delay in the design to tolerate soft errors induced by different energy levels. However, due to a lack of testability on DAD-FF, its soft-error tolerability is not yet verified, leading to uncertain design reliability. Therefore, this work proposes Delay-Adjustable, Self-Testable Flip-Flop (DAST-FF), built on top of DAD-FF with two extra MUXs (one for scan test and the other for latching-delay verification) to achieve both soft-error tolerability and testability. Meanwhile, a built-in self-test method is also developed on DAST-FFs to verify the cumulative latching delay before operation. The experimental result shows that for a design with 8,802 DAST-FFs, the built-in self-test method only takes 946 ns to ensure the soft-error tolerability. As to the testability, the enhanced scan capability can be enabled by inserting one extra transmission gate into DAST-FF with only 4.5 area overhead.
Published: 28 June 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-25; https://doi.org/10.1145/3460229
Field Programmable Gate Arrays ( FPGAs ) are increasingly used in cloud applications and being integrated into Systems-on-Chip. For these systems, various side-channel attacks on cryptographic implementations have been reported, motivating one to apply proper countermeasures. Beyond cryptographic implementations, maliciously introduced covert channel receivers and transmitters can allow one to exfiltrate other secret information from the FPGA. In this article, we present a fast covert channel on FPGAs, which exploits the on-chip power distribution network. This can be achieved without any logical connection between the transmitter and receiver blocks. Compared to a recently published covert channel with an estimated 4.8 Mbit/s transmission speed, we show 8 Mbit/s transmission and reduced errors from around 3% to less than 0.003%. Furthermore, we demonstrate proper transmissions of word-size messages and test the channel in the presence of noise generated from other residing tenants’ modules in the FPGA. When we place and operate other co-tenant modules that require 85% of the total FPGA area, the error rate increases to 0.02%, depending on the platform and setup. This error rate is still reasonably low for a covert channel. Overall, the transmitter and receiver work with less than 3–5% FPGA LUT resources together. We also show the feasibility of other types of covert channel transmitters, in the form of synchronous circuits within the FPGA.
Published: 28 June 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-22; https://doi.org/10.1145/3460230
Digital microfluidic biochips (DMFBs) have been a revolutionary platform for automating and miniaturizing laboratory procedures with the advantages of flexibility and reconfigurability. The placement problem is one of the most challenging issues in the design automation of DMFBs. It contains three interacting NP-hard sub-problems: resource binding, operation scheduling, and module placement. Besides, during the optimization of placement, complex constraints must be satisfied to guarantee feasible solutions, such as precedence constraints, storage constraints, and resource constraints. In this article, a new placement method for DMFB is proposed based on an evolutionary algorithm with novel heuristic-based decoding strategies for both operation scheduling and module placement. Specifically, instead of using the previous list scheduler and path scheduler for decoding operation scheduling chromosomes, we introduce a new heuristic scheduling algorithm (called order scheduler) with fewer limitations on the search space for operation scheduling solutions. Besides, a new 3D placer that combines both scheduling and placement is proposed where the usage of the microfluidic array over time in the chip is recorded flexibly, which is able to represent more feasible solutions for module placement. Compared with the state-of-the-art placement methods (T-tree and 3D-DDM), the experimental results demonstrate the superiority of the proposed method based on several real-world bioassay benchmarks. The proposed method can find the optimal results with the minimum assay completion time for all test cases.
Published: 28 June 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-18; https://doi.org/10.1145/3460436
Compute-in-memory (CIM) is an attractive solution to address the “memory wall” challenges for the extensive computation in deep learning hardware accelerators. For custom ASIC design, a specific chip instance is restricted to a specific network during runtime. However, the development cycle of the hardware is normally far behind the emergence of new algorithms. Although some of the reported CIM-based architectures can adapt to different deep neural network (DNN) models, few details about the dataflow or control were disclosed to enable such an assumption. Instruction set architecture (ISA) could support high flexibility, but its complexity would be an obstacle to efficiency. In this article, a runtime reconfigurable design methodology of CIM-based accelerators is proposed to support a class of convolutional neural networks running on one prefabricated chip instance with ASIC-like efficiency. First, several design aspects are investigated: (1) the reconfigurable weight mapping method; (2) the input side of data transmission, mainly about the weight reloading; and (3) the output side of data processing, mainly about the reconfigurable accumulation. Then, a system-level performance benchmark is performed for the inference of different DNN models, such as VGG-8 on a CIFAR-10 dataset and AlexNet GoogLeNet, ResNet-18, and DenseNet-121 on an ImageNet dataset to measure the trade-offs between runtime reconfigurability, chip area, memory utilization, throughput, and energy efficiency.
Published: 28 June 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-20; https://doi.org/10.1145/3460971
This article discusses the high-performance near-memory neural network (NN) accelerator architecture utilizing the logic die in three-dimensional (3D) High Bandwidth Memory– (HBM) like memory. As most of the previously reported 3D memory-based near-memory NN accelerator designs used the Hybrid Memory Cube (HMC) memory, we first focus on identifying the key differences between HBM and HMC in terms of near-memory NN accelerator design. One of the major differences between the two 3D memories is that HBM has the centralized through- silicon-via (TSV) channels while HMC has distributed TSV channels for separate vaults. Based on the observation, we introduce the Round-Robin Data Fetching and Groupwise Broadcast schemes to exploit the centralized TSV channels for improvement of the data feeding rate for the processing elements. Using synthesized designs in a 28-nm CMOS technology, performance and energy consumption of the proposed architectures with various dataflow models are evaluated. Experimental results show that the proposed schemes reduce the runtime by 16.4–39.3% on average and the energy consumption by 2.1–5.1% on average compared to conventional data fetching schemes.
Published: 5 June 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-46; https://doi.org/10.1145/3451179
With the down-scaling of CMOS technology, the design complexity of very large-scale integrated is increasing. Although the application of machine learning (ML) techniques in electronic design automation (EDA) can trace its history back to the 1990s, the recent breakthrough of ML and the increasing complexity of EDA tasks have aroused more interest in incorporating ML to solve EDA tasks. In this article, we present a comprehensive review of existing ML for EDA studies, organized following the EDA hierarchy.
Published: 5 June 2021
ACM Transactions on Design Automation of Electronic Systems, Volume 26, pp 1-36; https://doi.org/10.1145/3443706
Side channel analysis attacks employ the emanated side channel information to deduce the secret keys from cryptographic implementations by analyzing the power traces during execution or scrutinizing faulty outputs. To be effective, a countermeasure must remove or conceal as many as possible side channels. However, many of the countermeasures against side channel attacks are applied independently. In this article, the authors present a novel countermeasure (referred to as QuadSeal ) against Power Analysis Attacks and Electromagentic Fault Injection Attacks (FIAs), which is an extension of the work proposed in Reference . The proposed solution relies on algorithmically balancing both Hamming distances and Hamming weights (where the bit transitions on the registers and gates are balanced, and the total number of 1s and 0s are balanced) by the use of four identical circuits with differing inputs and modified SubByte tables. By randomly rotating the four encryptions, the system is protected against variations, path imbalances, and aging effects. After generating the ciphertext, the output of each circuit is compared against each other to detect any fault injections or to correct the faulty ciphertext to gain reliability. The proposed countermeasure allows components to be switched off to save power or to run four executions in parallel for high performance when resistance against power analysis attacks is not of high priority, which is not available with the existing countermeasures (except software based where source code can be changed). The proposed countermeasure is implemented for Advanced Encryption Standard (AES) and tested against Correlation Power Analysis and Mutual Information Attacks attacks (for up to a million traces), and none of the secret keys was found even after one million power traces (the unprotected AES circuit is vulnerable for power analysis attacks within 5,000 power traces). A detection circuit (referred to as C-FIA circuit) is operated using the algorithmic redundancy presented in four circuits of QuadSeal to mitigate Electromagnetic Fault Injection Attacks. Using Synopsys PrimeTime, we measured the power dissipation of QuadSeal registers and XOR gates to test the effectiveness of Quadruple balancing methodology. We tested the QuadSeal countermeasure with C-FIA circuit against Differential Fault Analysis Attacks up to one million traces; no bytes of the secret key were found. This is the smallest known circuit that is capable of withstanding power-based side channel attacks when electromagnetic injection attack resistance, process variations, path imbalances, and aging effects are considered.