Refine Search

New Search

Results: 24

(searched for: doi:10.1145/3352460.3358274)
Save to Scifeed
Page of 1
Articles per Page
by
Show export options
  Select all
Jacob Fustos, Michael Bechtel,
Journal of Cryptographic Engineering pp 1-13; https://doi.org/10.1007/s13389-022-00289-8

The publisher has not yet granted permission to display this abstract.
Lutan Zhao, Peinan Li, Rui Hou, Michael C. Huang, Xuehai Qian, Lixin Zhang, Dan Meng
2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA) pp 346-359; https://doi.org/10.1109/hpca53966.2022.00033

Abstract:
Recently exposed vulnerabilities reveal the necessity to improve the security of branch predictors. Branch predictors record history about the execution of different processes, and such information from different processes are stored in the same structure and thus accessible to each other. This leaves the attackers with the opportunities for malicious training and malicious perception. Physical or logical isolation mechanisms such as using dedicated tables and flushing during context-switch can provide security but incur non-trivial costs in space and/or execution time. Randomization mechanisms incurs the performance cost in a different way: those with higher securities add latency to the critical path of the pipeline, while the simpler alternatives leave vulnerabilities to more sophisticated attacks.This paper proposes HyBP, a practical hybrid protection and effective mechanism for building secure branch predictors. The design applies the physical isolation and randomization in the right component to achieve the best of both worlds. We propose to protect the smaller tables with physically isolation based on (thread, privilege) combination; and protect the large tables with randomization. Surprisingly, the physical isolation also significantly enhances the security of the last-level tables by naturally filtering out accesses, reducing the information flow to these bigger tables. As a result, key changes can happen less frequently and be performed conveniently at context switches. Moreover, we propose a latency hiding design for a strong cipher by precomputing the "code book" with a validated, cryptographically strong cipher. Overall, our design incurs a performance penalty of 0.5% compared to 5.1% of physical isolation under the default context switching interval in Linux.
Gururaj Saileshwar, Rick Boivie, Tong Chen, Benjamin Segal, Alper Buyuktosunoglu
ACM Transactions on Architecture and Code Optimization, Volume 19, pp 1-24; https://doi.org/10.1145/3495152

Abstract:
Programs written in C/C++ are vulnerable to memory-safety errors like buffer-overflows and use-after-free. While several mechanisms to detect such errors have been previously proposed, they suffer from a variety of drawbacks, including poor performance, imprecise or probabilistic detection of errors, or requiring invasive changes to the ISA, binary-layout, or source-code that results in compatibility issues. As a result, memory-safety errors continue to be hard to detect and a principal cause of security problems. In this work, we present a minimally invasive and low-cost hardware-based memory-safety checking framework for detecting out-of-bounds accesses and use-after-free errors. The key idea of our mechanism is to re-purpose some of the “unused bits” in a pointer in 64-bit architectures to store an index into a bounds information table that can be used to catch out-bounds errors and use-after-free errors without any change to the binary layout. Using this memory-safety checking framework, we enable HeapCheck, a design for detecting Out-of-bounds and Use-after-free accesses for heap-objects, that are responsible for the majority of memory-safety errors in the wild. Our evaluations using C/C++ SPEC CPU 2017 workloads on Gem5 show that our solution incurs 1.5% slowdown on average, using an 8 KB on-chip SRAM cache for caching bounds-information. Our mechanism allows detection of out-of-bounds errors in user-code as well as in unmodified shared-library functions. Our mechanism has detected out-of-bounds accesses in 87 lines of code in the SPEC CPU 2017 benchmarks, primarily in Glibc v2.27 functions, that, to our knowledge, have not been previously detected even with popular tools like Address Sanitizer.
Ajeya Naithani, Sam Ainsworth, ,
IEEE Micro, Volume 42, pp 116-123; https://doi.org/10.1109/mm.2022.3163132

Abstract:
Vector Runahead delivers extremely high memory-level parallelism even for chains of dependent memory accesses with complex intermediate address computation, which conventional runahead techniques fundamentally cannot handle and therefore have ignored. It does this by rearchitecting runahead to use speculative data-level parallelism, rather than work-skipping, as its primary form of extracting more memory-level parallelism in runahead mode than a true execution can, which we hope will bring about an entirely new dimension for high-performance processors.
Jonathan Behrens, Adam Belay, M. Frans Kaashoek
Proceedings of the Seventeenth European Conference on Computer Systems; https://doi.org/10.1145/3492321.3519559

Abstract:
Today's applications pay a performance penalty for mitigations to protect against transient execution attacks such as Meltdown [32] and Spectre [25]. Such a reduction in performance directly translates to higher operating costs and degraded user experience. This paper measures the performance impact of these mitigations across a range of processors from multiple vendors and across several security boundaries to identify trends over successive generations of processors and to attribute how much of the overall slowdown is caused by each individual mitigation. We find that overheads for operating system intensive workloads have declined by as much as 10×, down to about 3% on modern CPUs, due to hardware changes that eliminate the need for the most expensive mitigations. Meanwhile, a JavaScript benchmark reveals approximately 20% overhead persists today because mitigations for Spectre V1 and Speculative Store Bypass have not become more efficient. Other workloads like virtual machines and single-process, compute-intensive applications did not show significant slowdowns on any of the processors we measured.
Zirui Neil Zhao, Houxiang Ji, Adam Morrison, Darko Marinov, Josep Torrellas
Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems; https://doi.org/10.1145/3503222.3507724

Abstract:
In security frameworks for speculative execution, an instruction is said to reach its Visibility Point (VP) when it is no longer vulnerable to pipeline squashes. Before a potentially leaky instruction reaches its VP, it has to stall—unless a defense scheme such as invisible speculation provides protection. Unfortunately, either stalling or protecting the execution of pre-VP instructions typically has a performance cost. One way to attain low-overhead safe execution is to develop techniques that speed-up the advance of the VP from older to younger instructions. In this paper, we propose one such technique. We find that the progress of the VP for loads is mostly impeded by waiting until no memory consistency violations (MCVs) are possible. Hence, our technique, called , tries to make loads invulnerable to MCVs as early as possible—a process we call pinning the loads in the pipeline. The result is faster VP progress and a reduction in the execution overhead of defense schemes. In this paper, we describe the hardware needed by , and two possible designs with different tradeoffs between hardware requirements and performance. Our evaluation shows that is very effective: extending three popular defense schemes against speculative execution attacks with reduces their average execution overhead on SPEC17 and on SPLASH2/PARSEC applications by about 50%. For example, on SPEC17, the execution overhead of the three defense schemes decreases from to , from to , and from to .
MohammadKazem Taram, Ashish Venkat, Dean Tullsen
IEEE Design & Test, Volume 39, pp 49-57; https://doi.org/10.1109/mdat.2022.3152633

Abstract:
This paper presents Context-Sensitive Fencing (CSF), a micro-code-level mitigation for multiple variants of Spectre. CSF leverages the ability to dynamically alter the instruction stream via the decoder, to seamlessly inject new micro-ops, including fences, only when dynamic conditions indicate they are needed. This enables the processor to protect against the attack with minimal impact on the efficacy of key performance features such as speculative execution. This research also examines several alternative fence implementations, and introduces new types of fences which allow most dynamic reorderings of loads and stores while still preventing speculative accesses from changing visible microarchitectural state.
Jiyong Yu, Mengjia Yan, Artem Khyzha, Adam Morrison, Josep Torrellas, Christopher W. Fletcher
Communications of the ACM, Volume 64, pp 105-112; https://doi.org/10.1145/3491201

Abstract:
Speculative execution attacks present an enormous security threat, capable of reading arbitrary program data under malicious speculation, and later exfiltrating that data over microarchitectural covert channels. This paper proposes speculative taint tracking (STT), a high security and high performance hardware mechanism to block these attacks. The main idea is that it is safe to execute and selectively forward the results of speculative instructions that read secrets, as long as we can prove that the forwarded results do not reach potential covert channels. The technical core of the paper is a new abstraction to help identify all micro-architectural covert channels, and an architecture to quickly identify when a covert channel is no longer a threat. We further conduct a detailed formal analysis on the scheme in a companion document. When evaluated on SPEC06 workloads, STT incurs 8.5% or 14.5% performance overhead relative to an insecure machine.
IEEE Computer Architecture Letters, Volume 20, pp 162-165; https://doi.org/10.1109/lca.2021.3123408

Abstract:
Speculative side-channel attacks access sensitive data and use transmitters to leak the data during wrong-path execution. Various defenses have been proposed to prevent such information leakage. However, not all speculatively executed instructions are unsafe: Recent work demonstrates that speculation invariant instructions are independent of speculative control-flow paths and are guaranteed to eventually commit, regardless of the speculation outcome. Compile-time information coupled with run-time mechanisms can then selectively lift defenses for speculation invariant instructions, reclaiming some of the lost performance. Unfortunately, speculation invariant instructions can easily be manipulated by a form of speculative interference to leak information via a new side-channel that we introduce in this paper. We show that forward speculative interference where older speculative instructions interfere with younger speculation invariant instructions effectively turns them into transmitters for secret data accessed during speculation. We demonstrate forward speculative interference on actual hardware, by selectively filling the reorder buffer (ROB) with instructions, pushing speculative invariant instructions in-or-out of the ROB on demand, based on a speculatively accessed secret. This reveals the speculatively accessed secret, as the occupancy of the ROB itself becomes a new speculative side-channel.
Aniket Deshmukh, Yale N. Patt
MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture; https://doi.org/10.1145/3466752.3480115

Abstract:
Modern OoO cores achieve high levels of performance using large instruction windows. Scaling the window size improves performance by making visible more of the parallelism present in programs. However, this leads to an exponential increase in area and power. We specify Criticality Driven Fetch (CDF), a new execution paradigm that preferentially fetches, allocates, and executes instructions on the critical path of the program. By skipping over non-critical instructions, critical instructions in the ROB can span a sequential instruction window larger than the size of the ROB. This increases the amount of parallelism that can be extracted from critical instructions, thereby improving performance. In our implementation, CDF improves performance by (a) increasing the MLP for independent loads executing concurrently, (b) fetching critical path loads past hard-to-predict branches (by resolving them earlier), and (c) by initiating last level cache misses that cannot be parallelized earlier. Accelerating critical loads using CDF achieves a 6.1% IPC improvement over a baseline OoO core with prefetching. Compared to Precise Runahead, the prior state of the art work on accelerating last level cache misses on the core, we provide better performance and reduce memory traffic and energy consumption by 4.0% and 7.2% respectively.
Jan Philipp Thoma, Jakob Feldtkeller, Markus Krausz, Tim Güneysu, Daniel J. Bernstein
24th International Symposium on Research in Attacks, Intrusions and Defenses; https://doi.org/10.1145/3471621.3471857

Abstract:
Recent research has revealed an ever-growing class of microarchitectural attacks that exploit speculative execution, a standard feature in modern processors. Proposed and deployed countermeasures involve a variety of compiler updates, firmware updates, and hardware updates. None of the deployed countermeasures have convincing security arguments, and many of them have already been broken. The obvious way to simplify the analysis of speculative-execution attacks is to eliminate speculative execution. This is normally dismissed as being unacceptably expensive, but the underlying cost analyses consider only software written for current instruction-set architectures, so they do not rule out the possibility of a new instruction-set architecture providing acceptable performance without speculative execution. A new ISA requires compiler and hardware updates, but these are happening in any case. This paper introduces BasicBlocker, a generic ISA modification that works for all common ISAs and that allows non-speculative CPUs to obtain most of the performance benefit that would have been provided by speculative execution. To demonstrate the feasibility of BasicBlocker, this paper defines a variant of the RISC-V ISA called BBRISC-V and provides a thorough evaluation on both a 5-stage in-order soft core and a superscalar out-of-order processor using an associated compiler and a variety of benchmarks.
Deeksha Dangwal, Zhizhou Zhang, Jedidiah R. Crandall, Timothy Sherwood
2021 International Symposium on Secure and Private Execution Environment Design (SEED) pp 150-162; https://doi.org/10.1109/seed51797.2021.00027

Abstract:
Application tuning requires a coordinated effort across hardware and software to achieve optimized application performance. Execution traces offer unique insights into a program’s behavior over real inputs and serve as an invaluable resource for hardware and software engineers during the co-optimization process. Unfortunately, these traces are rarely shared between technology partners because even the simplest address traces gathered from applications that utilize private data can divulge sensitive information. Developers must choose between sharing accurate and precise execution information that will lead to the best co-optimization results while protecting sensitive data. This is the fundamental tradeoff between utility and privacy in the context of program traces.Concurrently, global policy is moving in favor of providing users with privacy protections. As a field, we must develop tools, mechanisms, and primitives to uphold these regulatory protections. In this work, we utilize the leading industry standard: the LINDDUN privacy threat modeling method, to model the threats to privacy of traces. We leverage advances in information flow tracking techniques and LINDDUN’s mitigation strategies to prevent inadvertent leakage of information. We introduce multiple classes of privacy-enhancing tracing techniques that allow context-aware differentiation of what information should remain in the trace and in what amounts based on annotations of private user input. To explore how meaningful the privatized traces are, we compare cache simulation and prefetching properties. This new approach leaks as few as zero bits of sensitive information and has an order of magnitude better utility than prior work.
Wubing Wang, Guoxing Chen, YueQiang Cheng, , Zhiqiang Lin
Published: 9 July 2021
Abstract:
This paper presents Specularizer, a framework for uncovering speculative execution attacks using performance tracing features available in commodity processors. It is motivated by the practical difficulty of eradicating such vulnerabilities in the design of CPU hardware and operating systems and the principle of defense-in-depth. The key idea of Specularizer is the use of Hardware Performance Counters and Processor Trace to perform lightweight monitoring of production applications and the use of machine learning techniques for identifying the occurrence of the attacks during offline forensics analysis. Different from prior works that use performance counters to detect side-channel attacks, Specularizer monitors triggers of the critical paths of the speculative execution attacks, thus making the detection mechanisms robust to different choices of side channels used in the attacks. To evaluate Specularizer, we model all known types of exception-based and misprediction-based speculative execution attacks and automatically generate thousands of attack variants. Experimental results show that Specularizer yields superior detection accuracy and the online tracing of Specularizer incur reasonable overhead.
Mohamed Tarek Ibn Ziad, Miguel A. Arroyo, Evgeny Manzhosov, Ryan Piersma, Simha Sethumadhavan
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) pp 916-929; https://doi.org/10.1109/isca52012.2021.00076

Abstract:
Memory safety continues to be a significant software reliability and security problem, and low overhead and low complexity hardware solutions have eluded computer designers. In this paper, we explore a pathway to deployable memory safety defenses. Our technique builds on a recent trend in software: the usage of binning memory allocators. We observe that if memory allocation sizes (e.g., malloc sizes) are made an architectural feature, then it is possible to overcome many of the thorny issues with traditional approaches to memory safety such as compatibility with unsecured software and significant performance degradation. We show that our architecture, No-FAT, incurs an overhead of 8% on SPEC CPU2017 benchmarks, and our VLSI measurements show low power and area overheads. Finally, as No-FAT’s hardware is aware of the memory allocation sizes, it effectively mitigates certain speculative attacks (e.g., Spectre-V1) with no additional cost. When our solution is used for pre-deployment fuzz testing it can improve fuzz testing bandwidth by an order of magnitude compared to state-of-the-art approaches.
Xida Ren, Logan Moody, MohammadKazem Taram, Matthew Jordan, Dean M. Tullsen, Ashish Venkat
2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) pp 361-374; https://doi.org/10.1109/isca52012.2021.00036

Abstract:
Modern Intel, AMD, and ARM processors translate complex instructions into simpler internal micro-ops that are then cached in a dedicated on-chip structure called the micro-op cache. This work presents an in-depth characterization study of the micro-op cache, reverse-engineering many undocumented features, and further describes attacks that exploit the micro-op cache as a timing channel to transmit secret information. In particular, this paper describes three attacks – (1) a same thread cross-domain attack that leaks secrets across the user-kernel boundary, (2) a cross-SMT thread attack that transmits secrets across two SMT threads via the micro-op cache, and (3) transient execution attacks that have the ability to leak an unauthorized secret accessed along a misspeculated path, even before the transient instruction is dispatched to execution, breaking several existing invisible speculation and fencing-based solutions that mitigate Spectre.
Marco Vassena, Craig Disselkoen, Klaus von Gleissenthall, Sunjay Cauligi, Rami Gökhan Kıcı, Ranjit Jhala, Dean Tullsen, Deian Stefan
Proceedings of the ACM on Programming Languages, Volume 5, pp 1-30; https://doi.org/10.1145/3434330

Abstract:
We introduce Blade, a new approach to automatically and efficiently eliminate speculative leaks from cryptographic code. Blade is built on the insight that to stop leaks via speculative execution, it suffices to cut the dataflow from expressions that speculatively introduce secrets ( sources ) to those that leak them through the cache ( sinks ), rather than prohibit speculation altogether. We formalize this insight in a static type system that (1) types each expression as either transient , i.e., possibly containing speculative secrets or as being stable , and (2) prohibits speculative leaks by requiring that all sink expressions are stable. Blade relies on a new abstract primitive, protect , to halt speculation at fine granularity. We formalize and implement protect using existing architectural mechanisms, and show how Blade’s type system can automatically synthesize a minimal number of protect s to provably eliminate speculative leaks. We implement Blade in the Cranelift WebAssembly compiler and evaluate our approach by repairing several verified, yet vulnerable WebAssembly implementations of cryptographic primitives. We find that Blade can fix existing programs that leak via speculation automatically , without user intervention, and efficiently even when using fences to implement protect .
Xin Jin, Ningmei Yu
IEICE Electronics Express; https://doi.org/10.1587/elex.18.20210041

Abstract:
Transient execution attack does not affect the state of processor microarchitecture, which breaks the traditional definition of correct execution. It not only brings great challenges to the industrial product security, but also opens up a new research direction for the academic community. This paper proposes a defense mechanism for SMT processors against launching transient execution attacks using shared cache. The main structure includes two parts, a security shadow label and a transient execution cache. In the face of the side channel attacks widely used by transient execution attack, our defense mechanism adds a security shadow label to the memory request from the thread with high security requirement, so that the shared cache can distinguish the cache requests from different security level threads. At the same time, based on the record of security shadow label, the transient execution cache is used to preserve the historical data, so as to realize the repair of the cache state and prevent the modification of the cache state by misspeculated path from being exploited by attackers. Finally, the cache state is successfully guaranteed to be invisible to any attacker’s cache operations. This design only needs one operation similar to the normal memory access, thus reducing the memory access pressure. Compared with the existing defense schemes, our scheme can effectively prevent Spectre attack, and the overhead of performance is only 3.9%.
Brian C. Schwedock, Nathan Beckmann
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) pp 665-680; https://doi.org/10.1109/micro50266.2020.00061

Abstract:
The datacenter introduces new challenges for computer systems around tail latency and security. This paper argues that dynamic NUCA techniques are a better solution to these challenges than prior cache designs. We show that dynamic NUCA designs can meet tail-latency deadlines with much less cache space than prior work, and that they also provide a natural defense against cache attacks. Unfortunately, prior dynamic NUCAs have missed these opportunities because they focus exclusively on reducing data movement.We present Jumanji, a dynamic NUCA technique designed for tail latency and security. We show that prior last-level cache designs are vulnerable to new attacks and offer imperfect performance isolation. Jumanji solves these problems while significantly improving performance of co-running batch applications. Moreover, Jumanji only requires lightweight hardware and a few simple changes to system software, similar to prior D-NUCAs. At 20 cores, Jumanji improves batch weighted speedup by 14% on average, vs. just 2% for a non-NUCA design with weaker security, and is within 2% of an idealized design.
Samira Mirbagher-Ajorpaz, Gilles Pokam, Esmaeil Mohammadian-Koruyeh, Elba Garza, Nael Abu-Ghazaleh, Daniel A. Jimenez
2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) pp 1124-1137; https://doi.org/10.1109/micro50266.2020.00093

Abstract:
Detecting microarchitectural attacks is critical given their proliferation in recent years. Many of these attacks exhibit intrinsic behaviors essential to the nature of their operation, such as creating contention or misspeculation. This study systematically investigates the microarchitectural footprints of hardware-based attacks and shows how they can be detected and classified using an efficient hardware predictor. We present a methodology to use correlated microarchitectural statistics to design a hardware-based neural predictor capable of detecting and classifying microarchitectural attacks before data is leaked. Once a potential attack is detected, it can be proactively mitigated by triggering appropriate countermeasures.Our hardware-based detector, PerSpectron, uses perceptron learning to identify and classify attacks. Perceptron-based prediction has been successfully used in branch prediction and other hardware-based applications. PerSpectron has minimal performance overhead. The statistics being monitored have similar overhead to already existing performance monitoring counters. Additionally, PerSpectron operates outside the processor’s critical paths, offering security without added computation delay. Our system achieves a usable detection rate for detecting attacks such as SpectreV1, SpectreV2, SpectreRSB, Meltdown, breakingKSLR, Flush+Flush, Flush+Reload, Prime+Probe as well as cache-attack calibration programs. We also believe that the large number of diverse microarchitectural features offers both evasion resilience and interpretability—features not present in previous hardware security detectors. We detect these attacks early enough to avoid any data leakage, unlike previous work that triggers countermeasures only after data has been exposed.
Sam Ainsworth, Timothy M. Jones
2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) pp 132-144; https://doi.org/10.1109/isca45697.2020.00022

Abstract:
The disclosure of the Spectre speculative-execution attacks in January 2018 has left a severe vulnerability that systems are still struggling with how to patch. The solutions that currently exist tend to have incomplete coverage, perform badly, or have highly undesirable performance edge cases.MuonTrap allows processors to continue to speculate, avoiding significant reductions in performance, without impacting security. We instead prevent the propagation of any state based on speculative execution, by placing the results of speculative cache accesses into a small, fast L0 filter cache, that is non-inclusive, non-exclusive with the rest of the cache hierarchy. This isolates all parts of the system that can’t be quickly cleared on any change in threat domain. MuonTrap uses these speculative filter caches, which are cleared on context and protection-domain switches, along with a series of extensions to the cache coherence protocol and prefetcher. This renders systems immune to cross-domain information leakage via Spectre and a host of similar attacks based on speculative execution, with low performance impact and few changes to the CPU design.
Page of 1
Articles per Page
by
Show export options
  Select all
Back to Top Top