Caffe con Troll

31 May 2015

conference paper
tutorial
Published by Association for Computing Machinery (ACM)

Vol. 2015
https://doi.org/10.1145/2799562.2799641

Abstract

We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve a 6:3× throughput improvement over Caffe on popular networks like CaffeNet. Moreover, with these improvements, the end-to-end training time for CNNs is directly proportional to the FLOPS delivered by the CPU, which enables us to efficiently train hybrid CPU-GPU systems for CNNs.

Keywords

Other Versions

Version 2, 2015-04-16, preprints

Funding Information

Defense Advanced Research Projects Agency (FA8750-12-2-0335, FA8750-13-2-0039)
National Science Foundation (IIS-1353606)
National Institutes of Health (U54EB020405)
Office of Naval Research (N000141210041, N000141310129)

This publication has 4 references indexed in Scilit:

SINGA
Published by Association for Computing Machinery (ACM) ,2015
DimmWitted
Proceedings of the VLDB Endowment, 2014
Deep Learning: Methods and Applications
Foundations and Trends® in Signal Processing, 2014
Theano: A CPU and GPU Math Compiler in Python
Proceedings of the Python in Science Conference, 2010

Cited by 27 articles