Beyond Keywords: A Context-based Hybrid Approach to Mining Ethical Concern-related App Reviews

With the increasing proliferation of mobile applications in our everyday experiences, the concerns surrounding ethics have surged significantly. Users generally communicate their feedback, report issues, and suggest new functionalities in application (app) reviews, frequently emphasizing safety, privacy, and accountability concerns. Incorporating these reviews is essential to developing successful products. However, app reviews related to ethical concerns generally use domain-specific language and are expressed using a more varied vocabulary. Thus making automated ethical concern-related app review extraction a challenging and time-consuming effort. This study proposes a novel Natural Language Processing (NLP) based approach that combines Natural Language Inference (NLI), which provides a deep comprehension of language nuances, and a decoder-only (LLaMA-like) Large Language Model (LLM) to extract ethical concern-related app reviews at scale. Utilizing 43,647 app reviews from the mental health domain, the proposed methodology 1) Evaluates four NLI models to extract potential privacy reviews and compares the results of domain-specific privacy hypotheses with generic privacy hypotheses; 2) Evaluates four LLMs for classifying app reviews to privacy concerns; and 3) Uses the best NLI and LLM models further to extract new privacy reviews from the dataset. Results show that the DeBERTa-v3-base-mnli-fever-anli NLI model with domain-specific hypotheses yields the best performance, and Llama3.1-8B-Instruct LLM performs best in the classification of app reviews. Then, using NLI+LLM, an additional 1,008 new privacy-related reviews were extracted that were not identified through the keyword-based approach in previous research, thus demonstrating the effectiveness of the proposed approach.

翻译：随着移动应用在日常生活中的日益普及，围绕伦理问题的关切显著增加。用户通常在应用评论中反馈意见、报告问题并提出新功能建议，其中经常强调安全性、隐私性和责任性等问题。整合这些评论对于开发成功的产品至关重要。然而，与伦理关切相关的应用评论通常使用领域特定语言，且表达词汇更为多样，这使得自动提取相关评论成为一项具有挑战性且耗时的工作。本研究提出了一种新颖的基于自然语言处理的方法，该方法结合了能够深入理解语言细微差别的自然语言推理模型和仅解码器架构（类似LLaMA）的大语言模型，以大规模提取与伦理关切相关的应用评论。利用来自心理健康领域的43,647条应用评论，所提出的方法：1）评估了四种NLI模型以提取潜在的隐私相关评论，并比较了领域特定隐私假设与通用隐私假设的结果；2）评估了四种LLM对应用评论进行隐私关切分类的性能；3）进一步使用最佳NLI和LLM模型从数据集中提取新的隐私相关评论。结果表明，采用领域特定假设的DeBERTa-v3-base-mnli-fever-anli NLI模型性能最佳，而Llama3.1-8B-Instruct LLM在应用评论分类中表现最优。随后，通过NLI+LLM组合方法，额外提取了1,008条新的隐私相关评论，这些评论在先前研究中未通过基于关键词的方法识别，从而证明了所提方法的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日