We present an analysis of 12 million instances of privacy-relevant reviews publicly visible on the Google Play Store that span a 10 year period. By leveraging state of the art NLP techniques, we can examine what users have been writing about privacy along multiple dimensions: time, countries, app types, diverse privacy topics, and even across a spectrum of emotions. We find consistent growth of privacy-relevant reviews, and explore topics that are trending (such as Data Deletion and Data Theft), as well as those on the decline (such as privacy-relevant reviews on sensitive permissions). We find that although privacy reviews come from more than 200 countries, 33 countries provide 90% of privacy reviews. We conduct a comparison across countries by examining the distribution of privacy topics a country's users write about, and find that geographic proximity is not a reliable indicator that nearby countries have similar privacy perspectives. We uncover some countries with unique patterns and explore those herein. Surprisingly, we uncover that it is not uncommon for reviews that discuss privacy to be positive (32%); many users express pleasure about privacy features within apps or privacy-focused apps. We also uncover some unexpected behaviors, such as the use of reviews to deliver privacy disclaimers to developers. Finally, we demonstrate the value of analyzing app reviews with our approach as a complement to existing methods for understanding users' perspectives about privacy.
翻译:我们针对谷歌应用商店近十年间可公开获取的1200万条隐私相关评论开展了分析研究。通过运用最先进的自然语言处理技术,我们从时间维度、地域分布、应用类型、多元隐私主题乃至情感光谱等多个层面,系统考察了用户关于隐私问题的表述特征。研究发现:隐私相关评论数量持续增长,数据删除与数据窃取等议题热度攀升,而敏感权限相关隐私评论则呈下降趋势。尽管隐私评论覆盖200余个国家,但仅33个国家就贡献了90%的评论总量。通过对比各国用户关注的隐私主题分布,我们发现地理邻近性与隐私观点相似性之间并不存在可靠关联,部分国家呈现出独特的隐私议题模式。令人意外的是,涉及隐私讨论的评论中有32%呈现积极情绪,众多用户对应用中的隐私保护功能或隐私专项应用表示赞赏。我们还发现了一些非常规现象,例如用户通过评论向开发者传递隐私声明。最后,我们证实了该方法作为现有隐私认知研究手段的补充,对于分析应用评论具有重要价值。