What Do Users Ask in Open-Source AI Repositories? An Empirical Study of GitHub Issues

Artificial Intelligence systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality.

翻译：人工智能系统受益于大规模数据集的可用性和不断增强的计算能力，已成为自然语言理解、语音识别和图像处理等关键任务的有效解决方案。这些AI系统的进步离不开开源软件（OSS）。本文提出一项实证研究，通过调查开源AI仓库中的议题，帮助开发者理解部署AI系统过程中遇到的问题。我们从PapersWithCode平台收集了576个仓库，利用GitHub REST API在其中发现了24,953个议题。实证研究包含三个阶段。首先，我们手动分析这些议题，以分类开发者在开源AI仓库中可能遇到的典型问题。具体而言，我们提出了一个包含13个与AI系统相关类别的分类体系。最常见的两类问题是运行时错误（23.18%）和说明不清（19.53%）。其次，我们看到67.5%的议题已关闭，并发现其中一半的议题在四天内解决。此外，议题管理功能（如标签和指派）在开源AI仓库中并未广泛采用。具体而言，仅有7.81%和5.9%的仓库分别对议题进行标记和指派给负责人。最后，我们通过实证表明，使用GitHub议题管理功能并编写详细描述的议题有助于问题的解决。基于我们的发现，我们为开发者提出建议，以帮助更好地管理开源AI仓库的议题并提升其质量。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

Artificial Intelligence: Ready to Ride the Wave? BCG 28页PPT

专知会员服务

28+阅读 · 2022年2月20日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日