Demystifying the Vetting Process of Voice-controlled Skills on Markets

Abstract
Smart speakers, such as Google Home and Amazon Echo, have become popular. They execute user voice commands via their built-in functionalities together with various third-party voice-controlled applications, called skills. Malicious skills have brought significant threats to users in terms of security and privacy. As a countermeasure, only skills passing the strict vetting process can be released onto markets. However, malicious skills have been reported to exist on markets, indicating that the vetting process can be bypassed. This paper aims to demystify the vetting process of skills on main markets to discover weaknesses and protect markets better. To probe the vetting process, we carefully design numerous skills, perform the Turing test, a test for machine intelligence, to determine whether humans or machines perform vetting, and leverage natural language processing techniques to analyze their behaviors. Based on our comprehensive experiments, we gain a good understanding of the vetting process (e.g., machine or human testers and skill exploration strategies) and discover some weaknesses. In this paper, we design three types of attacks to verify our results and prove an attacker can embed sensitive behaviors in skills and bypass the strict vetting process. Accordingly, we also propose countermeasures to these attacks and weaknesses.
Funding Information
  • Youth Innovation Promotion Association CAS
  • Huawei
  • National Key R&D Program of China (2020AAA0105200)
  • Beijing Natural Science Foundation (No.JQ18011)
  • National Top-notch Youth Talents Program of China
  • NSFC (U1836211)
  • Beijing Academy of Artificial Intelligence

This publication has 38 references indexed in Scilit: