CF '22: 19th ACM International Conference on Computing Frontiers

Conference Information
Name: CF '22: 19th ACM International Conference on Computing Frontiers
Location: Turin, Italy

Latest articles from this conference

Hanna Vivien Schwarzwald, Sebastian Würl, Martin Langer, Carsten Trinitis
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3531531

Abstract:
Different from terrestrial applications, most imaging nanosatellites are relying on simplistic command sequences for on-board controls. Combined with the unpredictable nature of flight operations, this can result in tedious and work-intensive operations, as unforeseen events might mean that the commands do not fit the needs of the operators anymore. The restricted communication windows meanwhile require a high level of automation whilst keeping the size of uplinked sequences minimal. Therefore we require a dynamic control language that is also able to operate within the resource limitations given by a nanosatellite. In an effort to combine all these requirements, we chose to implement MicroPython as the control language for our satellite payload. This extended abstract shall introduce the architecture and concepts of our implementation. Together with our presentation at CompSpace '22 it shall serve as a basis for discussion of using MicroPython as a payload control language on nanosatellites.
Gildo Torres, Chen Liu
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530226

Abstract:
In recent years, multiple techniques have been proposed to defend computing systems against control-oriented attacks that hijack the control-flow of the victim program. Data-only attacks, on the other hand, are a less common and more subtle type of exploit which are more difficult to detect using traditional mitigation techniques that target control-oriented attacks. In this paper we introduce a novel methodology for the detection of data-only attacks through modeling the execution behavior of an application using low-level hardware information collected as a data series during execution. One unique aspect of the proposed methodology is that it uses a compilation flag based approach to collect hardware counts, eliminating the need for manual code instrumentation. Another unique aspect is the introduction of a data compression algorithm as the classifier. Using several representative real-world data-only exploits, our experiments show that data-only attacks can be detected with high accuracy using the proposed methodology. We also performed analysis on how to select the most relevant hardware events for the detection of the studied data-only attack, as well as a quantitative study of hardware events' sensitivity to interference.
Max Ghiglione, Vittorio Serra
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530985

Abstract:
Higher autonomy in satellite operation is seen as the key game changer for the space systems market in the next decade, with a considerable amount of agencies and startups focusing on bringing machine learning to space. The adoption of Artificial Intelligence on-board of satellites is still limited due to the processing capabilities of radiation hardened hardware, which requires flight-heritage and extensive qualification. At the same time, the satellite market is undergoing a major paradigm shift from a hardware equipment perspective. Classical approaches, which aim at realizing satellites compliant with mission profiles including a long-lasting operational life and an extremely high reliability are ill-suited for many of the new market segments. The satellite-manufacturing industry is gradually adapting to these new mission requirements by identifying segments where components-off-the-shelf (COTS) can be employed. The latest generation of commercial components offer the unique possibility to integrate AI-algorithms with relative ease with tool assisted design and a much higher performance in parallel processing. In this position paper, the authors introduce the state of art of on-board AI and present the approach that is currently being researched in Airbus Defence and Space to perform neural network inference in various mission scenarios.
Runkai Yang, Xiaolin Chang, Jelena Mišić, Vojislav Mišić, Haoran Zhu
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530248

Abstract:
Fork after withholding (FAW) attack is an easy-to-conduct attack in the Bitcoin system and it is hard to be detected than some attacks like selfish mining and selfholding attacks. The previous studies about FAW attack made some strong assumptions, such as no propagation delay in the network. This paper aims to quantitatively examine the profitability of FAW attack in Bitcoin system with block propagation delay. We first establish a novel analytic model, which can analyze FAW attack in the Bitcoin system. Then we apply the model to design metric formulas for the Bitcoin system. These formulas can be used to evaluate the miner profitability (in terms of miner reward) and the impact of FAW attack on system throughput (in terms of transactions per second). We make a comparison of FAW attack and other attacks (including selfish mining and selfholding attacks). Experimental results reveal that FAW adversaries can get more rewards in the network with propagation delay than without delay. The results of the comparison of selfish mining and FAW attacks show that adversaries with large computational power can conduct selfish mining or selfholding attack to get more rewards, but they can conduct FAW attack to profit more when their computational power is small. Our work can be used to analyze Bitcoin-like blockchain systems and help design and evaluate security mechanisms.
Zewei Mo, Zejia Lin, Xianwei Zhang, Yutong Lu
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530231

Abstract:
Arithmetic operators are now used in a wide spectrum of domains, including artificial intelligence, data analytics and scientific computing. Meanwhile, specialized hardware components to enable low-precision computing are increasingly deployed in GPUs and accelerators. Whereas promising to boost performance, accelerating the operators on the hardware necessitates manually tuning the mixed-precision knobs to balance the performance and accuracy, which can be extremely challenging in real practices. To address the issue, we present moTuner, an automatic framework for efficiently tuning mixed-precision operators. moTuner works on compiler-level to automatically enable the mixed-precision computation, without involving any manual modifications of source code and/or the operator library, thus significantly alleviating the programming burden. Owing to be implemented in compilation phase, moTuner can be more widely applicable with lessened efforts on the libraries. Further, moTuner adopts optimized search strategy in tuning to effectively narrow down the configuration space. The evaluations on GEMM operators and real applications demonstrate that moTuner achieves performance improvement up to 3.13x and 1.15x respectively, while guaranteeing considerably high accuracy.
Nanmiao Wu, Vito Giovanni Castellana, Hartmut Kaiser
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530784

Abstract:
As hardware architectures and software stacks complexity grows, development productivity, performance and software portability, quickly evolve from desirable features to actual needs. SHAD, the Scalable High-performance Algorithms and Data-structures C++ library is designed to mitigate these issues: it provides general purpose building blocks as well as high-level custom utilities, and offers a shared-memory programming abstraction which facilitates the programming of complex systems, scaling up to High Performance Computing clusters. SHAD's portability is achieved through an abstract runtime interface, which decouples the upper layers of the library and hides the low level details of the underlying architecture. This layer enables SHAD to interface with different runtime/threading systems, e.g. Intel TBB and Global Memory and Threading (GMT). However, current backends targeting distributed systems, rely on a centralized controller which may possibly limit scalability up to hundreds of nodes and creates a network hot spot due to all to one communication for synchronization, and possibly resulting in degraded performance at high process counts. In this research, we explore HPX, the C++ standard library for parallelism and concurrency, as an additional backend in support of the SHAD library, and present the methodologies in support of local and remote task executions in SHAD with respect to HPX. Finally, we evaluate the proposed system by comparing against existing backends of SHAD and analyzing their performance on C++ Standard Template Library algorithms.
Nicolas Bohm Agostini, Serena Curzel, David Kaeli, Antonino Tumeo
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530866

Abstract:
Due to technology and power limitations, general-purpose processing units are experiencing progressively smaller performance gains. Computer architecture innovations are essential to keep performance steadily increasing. Thus domain-specific accelerators are receiving renewed interest and have shown to benefit different scientific and machine learning applications [1, 3]. High-Level-Synthesis (HLS) provides a way to quickly generate hardware descriptions for domain-specific accelerators starting from high-level applications. However, state-of-the-art tools typically require the application to be manually translated to C/C++ and carefully annotated to improve final design performance. This cumbersome process prevents scientists and researchers from tapping into the power of HLS, as many of their applications require significant effort to be ported.
Martin Molan, Andrea Borghesi, Luca Benini, Andrea Bartolini
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530867

Abstract:
Automated and data-driven methodologies are being introduced to assist system administrators in managing increasingly complex modern HPC systems. Anomaly detection (AD) is an integral part of improving the overall availability as it eases the system administrators' burden and reduces the time between an anomaly and its resolution. This work improves upon the current state-of-the-art (SoA) AD model by considering temporal dependencies in the data and including long-short term memory cells in the architecture of the AD model. The proposed model is evaluated on a complete ten-month history of a Tier-0 system (Marconi100 from CINECA consisting of 985 nodes). The proposed model achieves an area under the curve (AUC) of 0.758, improving upon the state-of-the-art approach that achieves an AUC of 0.747.
Yang Yang, Sanmukh R. Kuppannagari, Rajgopal Kannan, Viktor K. Prasanna
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530225

Abstract:
Homomorphic encryption (HE) is a promising technique to ensure the security and privacy of applications in the cloud. Number Theoretic Transform (NTT) is a key operation in HE-based applications. HE requires vastly different NTT parameters to meet the performance and security requirements of applications. The increasing compute capabilities and flexibility of FPGAs make them attractive to accelerate NTT. However, programming FPGA still involves hardware design expertise and significant development effort. To close the gap, we propose NTTGen, a framework to automatically generate low latency NTT designs targeting HE-based applications. NTTGen takes application parameters, latency and hardware resource constraints as input, determines the design parameters, and produces synthesizable Verilog code as output. Low latency NTT implementations are obtained by varying the data, pipeline and batch parallelism. NTTGen utilizes streaming permutation network to reduce the interconnect complexity between stages in the NTT computation. The framework supports two types of NTT cores to perform modular arithmetic, the key computation in NTT: a low latency and resource efficient NTT core for a specific class of prime moduli and a general purpose NTT core for other primes. We further develop a design space exploration flow to identify the hardware design parameters of an optimal design. We evaluate NTTGen by generating designs for various NTT parameters. The designs result in up to 2.9X improvement in latency over the state-of-the-art FPGA implementations.
Dante Niewenhuis, Ana-Lucia Varbanescu
Proceedings of the 19th ACM International Conference on Computing Frontiers; https://doi.org/10.1145/3528416.3530247

Abstract:
Strongly Connected Components (SCCs) are useful for many applications, such as community detection and personalized recommendation. Determining the SCCs of a graph, however, can be very expensive, and parallelization is not an easy way out: the paral-lelization itself is challenging, and its performance impact varies non-trivially with the input graph structure. This variability is due to trivial components, i.e., SCCs consisting of a single vertex, which lead to significant workload imbalance. Trimming is an effective method to remove trivial components, but is inefficient when used on graphs with few trivial components. In this work, we propose FB-AI-Trim, a parallel SCC algorithm with selective trimming. Our algorithm decides dynamically, at runtime, based on the input graph how to trim the graph. To this end, we train a neural network to predict, using topological graph information, whether trimming is beneficial for performance. We evaluate FB-AI-Trim using 173 unseen graphs, and compare it against four different static trimming models. Our results demonstrate that, over the set of graphs, FB-AI-Trim is the fastest algorithm. Furthermore, FB-AI-Trim is, in 80% of the cases, less than 10% slower than the best performing model on a single graph. Finally, FB-AI-Trim shows significant performance degradation in less than 3% of the graphs.
Back to Top Top