In todays digital landscape, end-user feedback plays a crucial role in the evolution of software applications, particularly in addressing issues that hinder user experience. While much research has focused on high-rated applications, low-rated applications often remain unexplored, despite their potential to reveal valuable insights. This study introduces a novel dataset curated from 64 low-rated applications sourced from the Amazon Software Appstore (ASA), containing 79,821 user reviews. The dataset is designed to capture the most frequent issues identified by users, which are critical for improving software quality. To further enhance the dataset utility, a subset of 6000 reviews was manually annotated to classify them into six district issue categories: user interface (UI) and user experience (UX), functionality and features, compatibility and device specificity, performance and stability, customer support and responsiveness, and security and privacy issues. This annotated dataset is a valuable resource for developing machine learning-based approaches aiming to automate the classification of user feedback into various issue types. Making both the annotated and raw datasets publicly available provides researchers and developers with a crucial tool to understand common issues in low-rated apps and inform software improvements. The comprehensive analysis and availability of this dataset lay the groundwork for data-derived solutions to improve software quality based on user feedback. Additionally, the dataset can provide opportunities for software vendors and researchers to explore various software evolution-related activities, including frequently missing features, sarcasm, and associated emotions, which will help better understand the reasons for comparatively low app ratings.
翻译:在当今数字环境中,终端用户反馈对软件应用的演进起着至关重要的作用,尤其是在解决影响用户体验的问题方面。尽管现有研究多聚焦于高评分应用,低评分应用却往往未被充分探索,尽管其可能揭示宝贵的洞见。本研究引入了一个新颖的数据集,该数据集从亚马逊软件应用商店(ASA)中采集了64款低评分应用,共包含79,821条用户评论。该数据集旨在捕捉用户识别出的最常见问题,这对提升软件质量至关重要。为进一步增强数据集的实用性,我们手动标注了其中6000条评论,将其分类为六个不同的问题类别:用户界面与用户体验、功能与特性、兼容性与设备特异性、性能与稳定性、客户支持与响应性,以及安全与隐私问题。该标注数据集为开发基于机器学习的方法提供了宝贵资源,这些方法旨在将用户反馈自动分类至不同问题类型。公开提供标注数据集与原始数据集,为研究者和开发者提供了关键工具,以理解低评分应用中的常见问题并指导软件改进。本数据集的全面分析与公开可用性,为基于用户反馈的数据驱动式软件质量提升方案奠定了基础。此外,该数据集可为软件供应商和研究者提供探索多种软件演进相关活动的机会,包括频繁缺失的功能、讽刺性表达及相关情感分析,这将有助于更深入地理解应用评分相对较低的原因。