Context: As mobile applications (Apps) widely spread over our society and life, various personal information is constantly demanded by Apps in exchange for more intelligent and customized functionality. An increasing number of users are voicing their privacy concerns through app reviews on App stores. Objective: The main challenge of effectively mining privacy concerns from user reviews lies in the fact that reviews expressing privacy concerns are overridden by a large number of reviews expressing more generic themes and noisy content. In this work, we propose a novel automated approach to overcome that challenge. Method: Our approach first employs information retrieval and document embeddings to unsupervisedly extract candidate privacy reviews that are further labeled to prepare the annotation dataset. Then, supervised classifiers are trained to automatically identify privacy reviews. Finally, we design an interpretable topic mining algorithm to detect privacy concern topics contained in the privacy reviews. Results: Experimental results show that the best performed document embedding achieves an average precision of 96.80% in the top 100 retrieved candidate privacy reviews. All of the trained privacy review classifiers can achieve an F1 value of more than 91%, outperforming the recent keywords matching baseline with the maximum F1 margin being 7.5%. For detecting privacy concern topics from privacy reviews, our proposed algorithm achieves both better topic coherence and diversity than three strong topic modeling baselines including LDA. Conclusion: Empirical evaluation results demonstrate the effectiveness of our approach in identifying privacy reviews and detecting user privacy concerns expressed in App reviews.
翻译:背景:随着移动应用程序(Apps)在社会和生活中广泛普及,各类个人信息不断被App索取,以换取更智能和个性化的功能。越来越多的用户通过在应用商店的评论中表达其对隐私问题的关切。目标:从用户评论中有效挖掘隐私关注的主要挑战在于,表达隐私关注评论的数量远少于表达更通用主题及包含噪声内容的评论。在本研究中,我们提出一种新颖的自动化方法来应对该挑战。方法:我们的方法首先利用信息检索和文档嵌入技术,以无监督方式提取候选隐私评论,并进一步对其进行标注以构建标注数据集。随后,训练监督式分类器以自动识别隐私评论。最后,我们设计了一种可解释的主题挖掘算法,用于检测隐私评论中包含的隐私关注主题。结果:实验结果表明,性能最优的文档嵌入在检索前100条候选隐私评论时,平均精确率达到96.80%。所有训练的隐私评论分类器的F1值均超过91%,优于近期采用关键词匹配的基线方法,最大F1值差距达7.5%。在从隐私评论中检测隐私关注主题方面,我们提出的算法在主题连贯性和多样性两个指标上均优于包括LDA在内的三种强主题建模基线方法。结论:实证评估结果证明了我们的方法在识别隐私评论及检测App评论中用户隐私关注方面的有效性。