Fairness is one of the socio-technical concerns that must be addressed in AI-based systems. Unfair AI-based systems, particularly unfair AI-based mobile apps, can pose difficulties for a significant proportion of the global population. This paper aims to analyze fairness concerns in AI-based app reviews.We first manually constructed a ground-truth dataset, including a statistical sample of fairness and non-fairness reviews. Leveraging the ground-truth dataset, we developed and evaluated a set of machine learning and deep learning classifiers that distinguish fairness reviews from non-fairness reviews. Our experiments show that our best-performing classifier can detect fairness reviews with a precision of 94%. We then applied the best-performing classifier on approximately 9.5M reviews collected from 108 AI-based apps and identified around 92K fairness reviews. Next, applying the K-means clustering technique to the 92K fairness reviews, followed by manual analysis, led to the identification of six distinct types of fairness concerns (e.g., 'receiving different quality of features and services in different platforms and devices' and 'lack of transparency and fairness in dealing with user-generated content'). Finally, the manual analysis of 2,248 app owners' responses to the fairness reviews identified six root causes (e.g., 'copyright issues') that app owners report to justify fairness concerns.
翻译:公平性是AI系统中必须解决的社会技术问题之一。存在不公平现象的AI系统,尤其是不公平的AI移动应用,可能对全球相当比例的人口造成困扰。本文旨在分析AI应用评论中的公平性问题。我们首先手动构建了一个真实数据集,包含公平性评论和非公平性评论的统计样本。利用该真实数据集,我们开发并评估了一系列机器学习与深度学习分类器,用于区分公平性评论与非公平性评论。实验表明,性能最优的分类器能以94%的精确率检测公平性评论。随后,我们将该最优分类器应用于从108个AI应用中收集的约950万条评论,识别出约9.2万条公平性评论。接着,对9.2万条公平性评论应用K-means聚类技术并进行人工分析,识别出六种不同类型的公平性问题(例如“在不同平台和设备上获得不同质量的功能与服务”以及“处理用户生成内容时缺乏透明度和公平性”)。最后,通过对2248条应用开发者对公平性评论的回复进行人工分析,确定了开发者用以解释公平性问题的六大根本原因(例如“版权问题”)。