Inclusiveness Matters: A Large-Scale Analysis of User Feedback

In an era of rapidly expanding software usage, catering to the diverse needs of users from various backgrounds has become a critical challenge. Inclusiveness, representing a core human value, is frequently overlooked during software development, leading to user dissatisfaction. Users often engage in discourse on online platforms where they indicate their concerns. In this study, we leverage user feedback from three popular online sources, Reddit, Google Play Store, and Twitter, for 50 of the most popular apps in the world to reveal the inclusiveness-related concerns from end users. Using a Socio-Technical Grounded Theory approach, we analyzed 23,107 posts across the three sources and identified 1,211 inclusiveness related posts. We organize our empirical results in a taxonomy for inclusiveness comprising 6 major categories: Fairness, Technology, Privacy, Demography, Usability, and Other Human Values. To explore automated support to identifying inclusiveness-related posts, we experimented with five state-of-the-art pre-trained large language models (LLMs) and found that these models' effectiveness is high and yet varied depending on the data source. GPT-2 performed best on Reddit, BERT on the Google Play Store, and BART on Twitter. Our study provides an in-depth view of inclusiveness-related user feedback from most popular apps and online sources. We provide implications and recommendations that can be used to bridge the gap between user expectations and software so that software developers can resonate with the varied and evolving needs of the wide spectrum of users.

翻译：在软件使用快速扩张的时代，满足来自不同背景用户的多样化需求已成为一项关键挑战。包容性作为人类核心价值，在软件开发过程中常被忽视，导致用户不满。用户常在在线平台上展开讨论，表达自己的关切。本研究利用来自Reddit、Google Play商店和Twitter这三个热门在线来源中全球50款最受欢迎应用的用户反馈，揭示终端用户与包容性相关的关切。采用社会技术扎根理论方法，我们分析了来自这三个来源的23,107篇帖子，识别出1,211篇与包容性相关的帖子。我们将实证结果组织成一个包含六大主要类别的包容性分类体系：公平性、技术、隐私、人口统计、可用性及其他人类价值。为探索自动识别包容性相关帖子的支持手段，我们实验了五种最先进的预训练大语言模型（LLMs），发现这些模型的有效性很高，但依数据源不同而有所差异。GPT-2在Reddit上表现最佳，BERT在Google Play商店上表现最佳，BART在Twitter上表现最佳。本研究深入展示了来自最流行应用和在线来源的与包容性相关的用户反馈。我们提供了启示和建议，可用于弥合用户期望与软件之间的差距，从而使软件开发人员能够与广泛用户多样化且不断变化的需求产生共鸣。