Security Concerns in Generative AI Coding Assistants: Insights from Online Discussions on GitHub Copilot

Generative Artificial Intelligence (GenAI) has become a central component of many development tools (e.g., GitHub Copilot) that support software practitioners across multiple programming tasks, including code completion, documentation, and bug detection. However, current research has identified significant limitations and open issues in GenAI, including reliability, non-determinism, bias, and copyright infringement. While prior work has primarily focused on assessing the technical performance of these technologies for code generation, less attention has been paid to emerging concerns of software developers, particularly in the security realm. OBJECTIVE: This work explores security concerns regarding the use of GenAI-based coding assistants by analyzing challenges voiced by developers and software enthusiasts in public online forums. METHOD: We retrieved posts, comments, and discussion threads addressing security issues in GitHub Copilot from three popular platforms, namely Stack Overflow, Reddit, and Hacker News. These discussions were clustered using BERTopic and then synthesized using thematic analysis to identify distinct categories of security concerns. RESULTS: Four major concern areas were identified, including potential data leakage, code licensing, adversarial attacks (e.g., prompt injection), and insecure code suggestions, underscoring critical reflections on the limitations and trade-offs of GenAI in software engineering. IMPLICATIONS: Our findings contribute to a broader understanding of how developers perceive and engage with GenAI-based coding assistants, while highlighting key areas for improving their built-in security features.

翻译：生成式人工智能（GenAI）已成为众多开发工具（如GitHub Copilot）的核心组件，这些工具在多项编程任务（包括代码补全、文档编写和错误检测）中为软件从业者提供支持。然而，当前研究已发现GenAI存在重大局限与未解决争议，包括可靠性不足、非确定性、偏见及版权侵权问题。已有研究主要聚焦于评估这些技术在代码生成方面的技术性能，而较少关注软件开发人员（尤其是安全领域）新出现的关切。目标：本研究通过分析开发者和软件爱好者在公共在线论坛中表达的挑战，探讨基于GenAI的编程助手的安全关切。方法：我们从三个主流平台（Stack Overflow、Reddit和Hacker News）中检索涉及GitHub Copilot安全问题的帖子、评论和讨论串，使用BERTopic对这些讨论进行聚类，继而通过主题分析进行综合，识别出安全关切的独特类别。结果：共识别出四大关切领域，包括潜在数据泄露、代码许可、对抗性攻击（如提示注入）及不安全的代码建议，揭示了软件工程中GenAI局限性与权衡的批判性反思。启示：我们的研究结果有助于更广泛地理解开发者如何认知和参与基于GenAI的编程助手，同时突出改进其内置安全功能的关键方向。