The energy inefficiency of the apps can be a major issue for the app users which is discussed on App Stores extensively. Previous research has shown the importance of investigating the energy related app reviews to identify the major causes or categories of energy related user feedback. However, there is no study that efficiently extracts the energy related app reviews automatically. In this paper, we empirically study different techniques for automatic extraction of the energy related user feedback. We compare the accuracy, F1-score and run time of numerous machine-learning models with relevant feature combinations and relatively modern Neural Network-based models. In total, 60 machine learning models are compared to 30 models that we build using six neural network architectures and three word embedding models. We develop a visualization tool for this study through which a developer can traverse through this large-scale result set. The results show that neural networks outperform the other machine learning techniques and can achieve the highest F1-score of 0.935. To replicate the research results, we have open sourced the interactive visualization tool. After identifying the best results and extracting the energy related reviews, we further compare various techniques to help the developers automatically investigate the emerging issues that might be responsible for energy inefficiency of the apps. We experiment the previously used string matching with results obtained from applying two of the state-of-the-art topic modeling algorithms, OBTM and AOLDA. Finally, we run a qualitative study performed in collaboration with developers and students from different institutions to determine their preferences for identifying necessary topics from previously categorized reviews, which shows OBTM produces the most helpful results.
翻译:应用能效低下可能是应用用户面临的主要问题,这一点在应用商店中已被广泛讨论。以往研究表明,调查与能源相关的应用评论对于识别用户反馈中能源问题的根本原因或类别具有重要意义。然而,目前尚无研究能够自动高效提取与能源相关的应用评论。本文实证研究了自动提取能源相关用户反馈的不同技术。我们比较了多种机器学习模型(采用相关特征组合)与相对现代的基于神经网络的模型的准确率、F1分数及运行时间。总计对比了60个机器学习模型与使用六种神经网络架构及三种词嵌入模型构建的30个模型。本研究开发了一个可视化工具,开发者可借此遍历大规模结果集。结果表明,神经网络优于其他机器学习技术,最高F1分数可达0.935。为复现研究结果,我们已将交互式可视化工具开源。在确定最优结果并提取能源相关评论后,我们进一步比较了多种技术,以帮助开发者自动调查可能导致应用能效低下的新兴问题。我们实验了先前使用的字符串匹配方法,并将其与两种先进的主题建模算法(OBTM和AOLDA)的应用结果进行对比。最后,我们联合不同机构的学生和开发者开展了一项定性研究,以确定他们对从已分类评论中识别必要主题的偏好,结果表明OBTM产生了最有益的结果。