JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services

Top Cited Papers

16 October 2019

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Mobile Computing

Vol. 20 (2), 565-576
https://doi.org/10.1109/tmc.2019.2947893

Abstract

Deep learning models are being deployed in many mobile intelligent applications. End-side services, such as intelligent personal assistants, autonomous cars, and smart home services often employ either simple local models on the mobile or complex remote models on the cloud. However, recent studies have shown that partitioning the DNN computations between the mobile and cloud can increase the latency and energy efficiencies. In this paper, we propose an efficient, adaptive, and practical engine, JointDNN, for collaborative computation between a mobile device and cloud for DNNs in both inference and training phase. JointDNN not only provides an energy and performance efficient method of querying DNNs for the mobile side but also benefits the cloud server by reducing the amount of its workload and communications compared to the cloud-only approach. Given the DNN architecture, we investigate the efficiency of processing some layers on the mobile device and some layers on the cloud server. We provide optimization formulations at layer granularity for forward- and backward-propagations in DNNs, which can adapt to mobile battery limitations and cloud server load constraints and quality of service. JointDNN achieves up to 18 and 32 times reductions on the latency and mobile energy consumption of querying DNNs compared to the status-quo approaches, respectively.

Keywords

Funding Information

NSF SHF
DARPA MTO
USC Annenberg Fellowship

This publication has 27 references indexed in Scilit:

A Review on mobile application energy profiling: Taxonomy, state-of-the-art, and open research issues
Journal of Network and Computer Applications, 2015
Realtime facial animation with on-the-fly correctives
ACM Transactions on Graphics, 2013
Migration and execution of JavaScript applications between mobile devices and cloud
Published by Association for Computing Machinery (ACM) ,2012
Refactoring android Java code for on-demand computation offloading
ACM SIGPLAN Notices, 2012
A close examination of performance and power characteristics of 4G LTE networks
Published by Association for Computing Machinery (ACM) ,2012
A Survey of Computation Offloading for Mobile Systems
Mobile Networks and Applications, 2012
An integrated GPU power and performance model
ACM SIGARCH Computer Architecture News, 2010
GPU implementation of neural networks
Pattern Recognition, 2004
Lagrange relaxation based method for the QoS routing problem
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Quality-of-service routing for supporting multimedia applications
IEEE Journal on Selected Areas in Communications, 1996

Cited by 157 articles