Demystifying Practices, Challenges and Expected Features of Using GitHub Copilot

from arxiv, Preprint accepted for publication in International Journal of Software Engineering and Knowledge Engineering, 2023. arXiv admin note: substantial text overlap with arXiv:2303.08733

With the advances in machine learning, there is a growing interest in AI-enabled tools for autocompleting source code. GitHub Copilot has been trained on billions of lines of open source GitHub code, and is one of such tools that has been increasingly used since its launch in June 2021. However, little effort has been devoted to understanding the practices, challenges, and expected features of using Copilot in programming for auto-completed source code from the point of view of practitioners. To this end, we conducted an empirical study by collecting and analyzing the data from Stack Overflow (SO) and GitHub Discussions. We searched and manually collected 303 SO posts and 927 GitHub discussions related to the usage of Copilot. We identified the programming languages, Integrated Development Environments (IDEs), technologies used with Copilot, functions implemented, benefits, limitations, and challenges when using Copilot. The results show that when practitioners use Copilot: (1) The major programming languages used with Copilot are JavaScript and Python, (2) the main IDE used with Copilot is Visual Studio Code, (3) the most common used technology with Copilot is Node.js, (4) the leading function implemented by Copilot is data processing, (5) the main purpose of users using Copilot is to help generate code, (6) the significant benefit of using Copilot is useful code generation, (7) the main limitation encountered by practitioners when using Copilot is difficulty of integration, and (8) the most common expected feature is that Copilot can be integrated with more IDEs. Our results suggest that using Copilot is like a double-edged sword, which requires developers to carefully consider various aspects when deciding whether or not to use it. Our study provides empirically grounded foundations that could inform developers and practitioners, as well as provide a basis for future investigations.

翻译：随着机器学习技术的进步，基于人工智能的源代码自动补全工具日益受到关注。GitHub Copilot基于数十亿行开源GitHub代码训练而成，自2021年6月发布以来逐渐被广泛使用。然而，目前鲜有研究从实践者视角深入探讨使用Copilot进行代码自动补全时的实践模式、面临的挑战及其预期功能特征。为此，我们通过收集并分析Stack Overflow（SO）与GitHub Discussions中的相关数据开展实证研究。我们系统检索并手动整理了303篇SO帖子与927条GitHub讨论记录，识别了使用Copilot时的编程语言类型、集成开发环境（IDE）、配套技术栈、实现功能类型、使用效益、局限性及挑战。结果表明，实践者在应用Copilot时呈现以下特征：（1）主要编程语言为JavaScript与Python；（2）主流IDE为Visual Studio Code；（3）最常见配套技术为Node.js；（4）核心实现功能为数据处理；（5）用户主要目的为辅助代码生成；（6）显著效益在于生成实用代码；（7）实践者遇到的主要限制是集成困难；（8）最受期待的改进功能是支持更多IDE集成。研究揭示，使用Copilot犹如双刃剑，开发者在决定是否采用时需要审慎权衡多维度因素。本研究为开发者与实践者提供了实证依据，同时也为后续研究奠定了理论基础。