Deep Learning Meets Private Talk: Conversational AI Can Predict Speaker Traits by Eavesdropping for Only 30 Seconds

Abstract
Conversational AI such as smart speakers placed in home environments can accidentally activate and record people’s talk for a short time. What can such devices learn about people by listening in on ongoing conversations? Taking two commonly used speaker traits as an example, we present the results of an experiment that simulates Conversational AI eavesdropping on ongoing talk using transcriptions of naturalistic conversations in private settings. We show that a currently popular type of deep learning-based system can reliably predict if a speaker is “young”, “old”, “female” or “male” (age=99%, gender=82%) based on what they say in around 30 seconds. Our results exemplify how powerful current big data language models are when it comes to data-driven predictions of personal information based on how people talk, even when listening only for a short time. We conclude the experiment with a critical comment on the increasingly pervasive use of such user modeling technology to compute speaker traits, touching upon some potential ethical concerns, bias, and privacy issues.

This publication has 5 references indexed in Scilit: