Multi-modal Convolutional Neural Networks for Activity Recognition

Abstract
Convolutional neural network (CNN), which comprises one or more convolutional and pooling layers followed by one or more fully-connected layers, has gained popularity due to its ability to learn fruitful representations from images or speeches, capturing local dependency and slight-distortion invariance. CNN has recently been applied to the problem of activity recognition, where 1D kernels are applied to capture local dependency over time in a series of observations measured at inertial sensors (3-axis accelerometers and gyroscopes). In this paper we present a multi-modal CNN where we use 2D kernels in both convolutional and pooling layers, to capture local dependency over time as well as spatial dependency over sensors. Experiments on benchmark datasets demonstrate the high performance of our multi-modal CNN, compared to several state of the art methods.

This publication has 10 references indexed in Scilit: