Waterwave: A GPU Memory Flow Engine for Concurrent DNN Training
- 22 May 2023
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Vol. PP (00189340), 1-14
- https://doi.org/10.1109/tc.2023.3278530
Abstract
Training Deep Neural Networks (DNN) concurrently is becoming increasingly important for deep learning practitioners, e.g., hyperparameter optimization (HPO) and neural architecture search (NAS) . The GPU memory capacity is the impediment that prohibits multiple DNNs from being trained on the same GPU due to the large memory usage during training. In this paper, we propose Waterwave a GPU memory flow engine for concurrent deep learning training. Firstly, to address the memory explosion brought by the long time lag between memory allocation and deallocation time, we develop an allocator tailored for multi-streams. By making the allocator aware of the stream information, a prioritized allocation is conducted based on the chunk's synchronization attributes, allowing us to provide useable memory after scheduling rather than waiting it to be really released after GPU computation. Secondly, Waterwave partitions the compute graph to a set of continuous node groups and then performs finer-grained scheduling: NodeGroup pipeline execution , to guarantee a proper memory requests order. Waterwave can accomplish up to 96.8% of the maximum batch size of solo training. Additionally, in scenarios with high memory demand, Waterwave can outperform existing spatial sharing and temporal sharing by up to 12x and 1.49x, respectively.Keywords
This publication has 11 references indexed in Scilit:
- CapuchinPublished by Association for Computing Machinery (ACM) ,2020
- SwapAdvisorPublished by Association for Computing Machinery (ACM) ,2020
- PipeDreamPublished by Association for Computing Machinery (ACM) ,2019
- Scheduling CPU for GPU-based Deep Learning JobsPublished by Association for Computing Machinery (ACM) ,2018
- Gist: Efficient Data Encoding for Deep Neural Network TrainingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2018
- SuperneuronsACM SIGPLAN Notices, 2018
- Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2018
- HyperDrivePublished by Association for Computing Machinery (ACM) ,2017
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Rethinking the Inception Architecture for Computer VisionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016