Refine Search

New Search

Results: 9

(searched for: doi:10.1145/3297858.3304040)
Save to Scifeed
Page of 1
Articles per Page
by
Show export options
  Select all
Zhengming Yi, Yiping Yao, Kai Chen
50th International Conference on Parallel Processing; https://doi.org/10.1145/3472456.3472475

Abstract:
Universal constructions are attractive as they can turn a sequential implementation of any data structure into a concurrent implementation. However, existing universal constructions have limitations, such as imposing high copying overhead, or poor scalability on NUMA systems mainly due to their lack of NUMA-aware design principles. To overcome these limitations, this paper introduces CR, a universal construction that provides highly scalable updates on NUMA systems while offering fast read-side performance. CR achieves NUMA-awareness by utilizing delegation within a NUMA node and a global shared log to maintain the consistency of replicas of data structures across nodes. Using CR does not require expertise in concurrent data structure design. Our evaluation shows that CR has up to 11.2 times better performance compared to a state-of-the-art universal construction CX on our tested sequential data structures. To demonstrate the effectiveness and applicability of CR, we have applied CR to an in-memory database system. The database shows up to 18.1 times better performance compared to the original version.
Published: 12 March 2021
The Journal of Supercomputing, Volume 77, pp 10827-10849; https://doi.org/10.1007/s11227-021-03719-2

The publisher has not yet granted permission to display this abstract.
Haichi Wang, Zan Wang, Jun Sun, Shuang Liu, Ayesha Sadiq, Yuan-Fang Li
Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering; https://doi.org/10.1145/3324884.3416625

Abstract:
The existing concurrency model for Java (or C) requires programmers to design and implement thread-safe classes by explicitly acquiring locks and releasing locks. Such a model is error-prone and is the reason for many concurrency bugs. While there are alternative models like transactional memory, manually writing locks remains prevalent in practice. In this work, we propose AutoLock, which aims to solve the problem by fully automatically generating thread-safe classes. Given a class which is assumed to be correct with sequential clients, AutoLock automatically generates a thread-safe class which is linearizable, and does it in a way without requiring a specification of the class. AutoLock takes three steps: (1) infer access annotations (i.e., abstract information on how variables are accessed and aliased), (2) synthesize a locking policy based on the access annotations, and (3) consistently implement the locking policy. AutoLock has been evaluated on a set of benchmark programs and the results show that AutoLock generates thread-safe classes effectively and could have prevented existing concurrency bugs.
Jeseong Yeon, LeeJu Kim, Youil Han, Hyeon Gyu Lee, Eunji Lee, Bryan S. Kim
Proceedings of the 21st International Middleware Conference; https://doi.org/10.1145/3423211.3425672

Abstract:
Multi-version concurrency control is a widely employed concurrency control mechanism, as it allows non-blocking accesses while providing isolation among transactions. However, maintaining multiple versions increases the latency for both point lookups and ranged retrievals because of the overhead in finding the right version. In particular, the append-only skip list---widely used in the state-of-the-art key-value stores (KVS)---shows a significant performance degradation due to its append-only nature. This paper presents a novel skip list implementation called JellyFish. JellyFish reduces the overhead of multi-version concurrency control by separating the per-key updates from the key indexing. We implement our design on top of RocksDB and compare it against a wide variety of data structures. Our evaluation with micro-benchmarks and real-world workloads show that we not only improve the throughput by up to 93%, but also reduce the latency of update operations by up to 42%.
Paul E. McKenney, Joel Fernandes, Silas Boyd-Wickizer, Jonathan Walpole
ACM SIGOPS Operating Systems Review, Volume 54, pp 47-63; https://doi.org/10.1145/3421473.3421481

Abstract:
Read-copy update (RCU) is a scalable high-performance synchronization mechanism implemented in the Linux kernel. RCU's novel properties include support for concurrent forward progress for readers and writers as well as highly optimized inter-CPU synchronization. RCU was introduced into the Linux kernel eighteen years ago and most subsystems now use RCU. This paper discusses the requirements that drove the development of RCU, the design and API of the Linux RCU implementation, and how kernel developers apply RCU.
, Yiping Yao
Concurrency and Computation: Practice and Experience, Volume 32; https://doi.org/10.1002/cpe.5964

The publisher has not yet granted permission to display this abstract.
Seongjae Park, Paul E. McKenney, Laurent Dufour, Heon Y. Yeom
Proceedings of the Fifteenth European Conference on Computer Systems; https://doi.org/10.1145/3342195.3387527

Abstract:
Read-copy update (RCU) can provide ideal scalability for read-mostly workloads, but some believe that it provides only poor performance for updates. This belief is due to the lack of RCU-centric update synchronization mechanisms. RCU instead works with a range of update-side mechanisms, such as locking. In fact, many developers embrace simplicity by using global locking. Logging, hardware transactional memory, or fine-grained locking can provide better scalability, but each of these approaches has limitations, such as imposing overhead on readers or poor scalability on non-uniform memory access (NUMA) systems, mainly due to their lack of NUMA-aware design principles. This paper introduces an RCU extension (RCX) that provides highly scalable RCU updates on NUMA systems while retaining RCU's read-side benefits. RCX is a software-based synchronization mechanism combining hardware transactional memory (HTM) and traditional locking based on our NUMA-aware design principles for RCU. Micro-bench-marks on a NUMA system having 144 hardware threads show RCX has up to 22.6 times better performance and up to 145 times lower HTM abort rates compared to a state-of-the-art RCU/HTM combination. To demonstrate the effectiveness and applicability of RCX, we have applied RCX to parallelize some of the Linux kernel memory management system and an in-memory database system. The optimized kernel and the database show up to 24 and 17 times better performance compared to the original version, respectively.
Page of 1
Articles per Page
by
Show export options
  Select all
Back to Top Top