A Dataset of Low-Rated Applications from the Amazon Appstore for User Feedback Analysis

In todays digital landscape, end-user feedback plays a crucial role in the evolution of software applications, particularly in addressing issues that hinder user experience. While much research has focused on high-rated applications, low-rated applications often remain unexplored, despite their potential to reveal valuable insights. This study introduces a novel dataset curated from 64 low-rated applications sourced from the Amazon Software Appstore (ASA), containing 79,821 user reviews. The dataset is designed to capture the most frequent issues identified by users, which are critical for improving software quality. To further enhance the dataset utility, a subset of 6000 reviews was manually annotated to classify them into six district issue categories: user interface (UI) and user experience (UX), functionality and features, compatibility and device specificity, performance and stability, customer support and responsiveness, and security and privacy issues. This annotated dataset is a valuable resource for developing machine learning-based approaches aiming to automate the classification of user feedback into various issue types. Making both the annotated and raw datasets publicly available provides researchers and developers with a crucial tool to understand common issues in low-rated apps and inform software improvements. The comprehensive analysis and availability of this dataset lay the groundwork for data-derived solutions to improve software quality based on user feedback. Additionally, the dataset can provide opportunities for software vendors and researchers to explore various software evolution-related activities, including frequently missing features, sarcasm, and associated emotions, which will help better understand the reasons for comparatively low app ratings.

翻译：在当今数字环境中，终端用户反馈对软件应用的演进起着至关重要的作用，尤其是在解决影响用户体验的问题方面。尽管现有研究多聚焦于高评分应用，低评分应用却往往未被充分探索，尽管其可能揭示宝贵的洞见。本研究引入了一个新颖的数据集，该数据集从亚马逊软件应用商店（ASA）中采集了64款低评分应用，共包含79,821条用户评论。该数据集旨在捕捉用户识别出的最常见问题，这对提升软件质量至关重要。为进一步增强数据集的实用性，我们手动标注了其中6000条评论，将其分类为六个不同的问题类别：用户界面与用户体验、功能与特性、兼容性与设备特异性、性能与稳定性、客户支持与响应性，以及安全与隐私问题。该标注数据集为开发基于机器学习的方法提供了宝贵资源，这些方法旨在将用户反馈自动分类至不同问题类型。公开提供标注数据集与原始数据集，为研究者和开发者提供了关键工具，以理解低评分应用中的常见问题并指导软件改进。本数据集的全面分析与公开可用性，为基于用户反馈的数据驱动式软件质量提升方案奠定了基础。此外，该数据集可为软件供应商和研究者提供探索多种软件演进相关活动的机会，包括频繁缺失的功能、讽刺性表达及相关情感分析，这将有助于更深入地理解应用评分相对较低的原因。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【阿姆斯特丹博士论文】从有偏见的用户互动中学习推荐系统，127页pdf

专知会员服务

24+阅读 · 2024年2月4日

【博士论文】弱反馈的序列决策问题

专知会员服务

25+阅读 · 2023年1月2日

面向推荐应用的差分隐私方案综述

专知会员服务

14+阅读 · 2021年9月14日

知识图谱在亚马逊产品的应用，79页ppt，亚马逊XIN LUNA DONG

专知会员服务

52+阅读 · 2021年7月5日