DjiNN and Tonic
- 13 June 2015
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM) in Proceedings of the 42nd Annual International Symposium on Computer Architecture
Abstract
As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, web-service companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications. In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120x for all but one application (40x for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000x throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20x, depending on the composition of the workloadKeywords
This publication has 28 references indexed in Scilit:
- Thermal time shiftingPublished by Association for Computing Machinery (ACM) ,2015
- ImageNet Large Scale Visual Recognition ChallengeInternational Journal of Computer Vision, 2015
- A reconfigurable fabric for accelerating large-scale datacenter servicesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- A historical perspective of speech recognitionCommunications of the ACM, 2014
- The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second editionSynthesis Lectures on Computer Architecture, 2013
- Whare-mapPublished by Association for Computing Machinery (ACM) ,2013
- Bubble-fluxPublished by Association for Computing Machinery (ACM) ,2013
- A defect-tolerant accelerator for emerging high-performance applicationsACM SIGARCH Computer Architecture News, 2012
- Distributed GraphLabProceedings of the VLDB Endowment, 2012
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998