Public Perceptions Towards Synthetic Voice Technology

Abstract
Text-to-Speech (TTS) technologies have provided ways to produce acoustic approximations of human voices. However, recent advancements in machine learning (i.e., neural network TTS) have helped move beyond coarse mimicry and towards more natural-sounding speech. With only a small collection of recorded utterances, it is now possible to generate wholly synthetic voices indistinguishable from those of human speakers. While these new approaches to speech synthesis can help facilitate more seamless experiences with artificial agents, they also lower the barrier to entry for those seeking to perpetrate deception. As such, in the development of these technologies, it is important to anticipate potential harms and devise strategies to help mitigate against misuse. This paper presents findings from a 360-person survey that assessed public perceptions of synthetic voices, with a particular focus on how voice type and social scenarios impact ratings of trust. Findings have implications for the responsible deployment of synthetic speech technologies.

This publication has 3 references indexed in Scilit: