Abstract
This paper discusses the technologies required to better understand and model human-human communication, and to use the resulting technologies to build computer-enhanced communication tools. As networks and computers become more pervasive, groups are increasingly using technology to assist communication and collaboration and to reduce travel needs. The addition of new technologies based on advanced signal (audio-visual) processing and multimedia information analysis can have a positive impact on meetings and human communication. However, human communication is complex and is factored across several modalities. To address the problem requires major research efforts in several traditionally separate disciplines including unconstrained speech recognition, visual scene analysis, modeling individuals and groups through the joint processing of multiple information channels, and structuring, indexing and summarizing these multimodal communication scenes. Projects such as AMI/AMIDA (www. amiproiect.org) have made significant progress in these basic areas. AMI/AMIDA research revolves around instrumented meeting rooms and advanced videoconferencing systems which enable the collection, annotation, structuring, and browsing of multimodal meeting recordings.