automemcpy: a framework for automatic generation of fundamental memory operations
- 22 June 2021
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 2021 ACM SIGPLAN International Symposium on Memory Management
Abstract
Memory manipulation primitives (memcpy, memset, memcmp) are used by virtually every application, from high performance computing to user interfaces. They often consume a significant portion of CPU cycles. Because they are so ubiquitous and critical, they are provided by language runtimes and in particular by libc, the C standard library. These implementations are heavily optimized, typically written in hand-tuned assembly for each target architecture. In this article, we propose a principled alternative to hand-tuning these functions: (1) we profile the calls to these functions in their production environment and use this data to drive the important high-level algorithmic decisions, (2) we use a high-level language for the implementation, delegate the job of tuning the generated code to the compiler, and (3) we use constraint programming and automatic benchmarks to select the optimal high-level structure of the functions. We compile our memfunctions implementations using the same compiler toolchain that we use for application code, which allows leveraging the compiler further by allowing whole-program optimization. We have evaluated our approach by applying it to the fleet of one of the largest computing enterprises in the world. This work increased the performance of the fleet by 1%.Keywords
This publication has 7 references indexed in Scilit:
- AsmDBPublished by Association for Computing Machinery (ACM) ,2019
- Autotuning in High-Performance Computing ApplicationsProceedings of the IEEE, 2018
- SecureDIS: A framework for secure Data Integration SystemsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- SPEC CPU2006 benchmark descriptionsACM SIGARCH Computer Architecture News, 2006
- Formal loop merging for signal transformsACM SIGPLAN Notices, 2005
- Minimizing development and maintenance costs in supporting persistently optimized BLASSoftware: Practice and Experience, 2005
- The Design and Implementation of FFTW3Proceedings of the IEEE, 2005