Smart speakers collect voice commands, which can be used to infer sensitive information about users. Given the potential for privacy harms, there is a need for greater transparency and control over the data collected, used, and shared by smart speaker platforms as well as third party skills supported on them. To bridge this gap, we build a framework to measure data collection, usage, and sharing by the smart speaker platforms. We apply our framework to the Amazon smart speaker ecosystem. Our results show that Amazon and third parties, including advertising and tracking services that are unique to the smart speaker ecosystem, collect smart speaker interaction data. We also find that Amazon processes smart speaker interaction data to infer user interests and uses those inferences to serve targeted ads to users. Smart speaker interaction also leads to ad targeting and as much as 30X higher bids in ad auctions, from third party advertisers. Finally, we find that Amazon's and third party skills' data practices are often not clearly disclosed in their policy documents.
翻译:智能音箱会收集语音指令,这些指令可被用于推断用户的敏感信息。鉴于潜在的隐私风险,需要提高智能音箱平台及其支持的第三方技能所收集、使用和共享数据的透明度与可控性。为弥补这一缺口,我们构建了一个测量框架,用于分析智能音箱平台的数据收集、使用与共享行为。我们将该框架应用于亚马逊智能音箱生态系统。研究结果表明,亚马逊及第三方(包括智能音箱生态系统特有的广告与追踪服务)均会收集智能音箱交互数据。我们还发现,亚马逊会处理智能音箱交互数据以推断用户兴趣,并利用这些推断向用户投放定向广告。智能音箱交互还会引发来自第三方广告商的广告定向行为,其在广告竞拍中的出价甚至高达常规水平的30倍。最后,我们的研究发现,亚马逊及第三方技能的数据处理实践通常未在其政策文件中得到明确披露。