Multimodal group action clustering in meetings

15 October 2004

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 54-62
https://doi.org/10.1145/1026799.1026810

Abstract

We address the problem of clustering multimodal group actions in meetings using a two-layer HMM framework. Meetings are structured as sequences of group actions. Our approach aims at creating one cluster for each group action, where the number of group actions and the action boundaries are unknown a priori. In our framework, the first layer models typical actions of individuals in meetings using supervised HMM learning and low-level audio-visual features. A number of options that explicitly model certain aspects of the data (e.g., asynchrony) were considered. The second layer models the group actions using unsupervised HMM learning. The two layers are linked by a set of probability-based features produced by the individual action layer as input to the group action layer. The methodology was assessed on a set of multimodal turn-taking group actions, using a public five-hour meeting corpus. The results show that the use of multiple modalities and the layered framework are advantageous, compared to various baseline methods.

Keywords

This publication has 15 references indexed in Scilit:

Event-based analysis of video
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Dynamic Bayesian networks for meeting structuring
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Distributed meetings
Published by Association for Computing Machinery (ACM) ,2002
Combining multiple estimators of speaking rate
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Multi-agent event recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The meeting project at ICSI
Published by Association for Computational Linguistics (ACL) ,2001
Robust Localization in Reverberant Rooms
Published by Springer Science and Business Media LLC ,2001
A Bayesian computer vision system for modeling human interactions
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2000
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia, 2000
The role of audible and visible back-channel responses in interpersonal communication.
Journal of Personality and Social Psychology, 1977

Cited by 20 articles