With the rapid development of large language models in code generation, AI-powered editors such as GitHub Copilot and Cursor are revolutionizing software development practices. At the same time, studies have identified potential defects in the generated code. Previous research has predominantly examined how code context influences the generation of defective code, often overlooking the impact of defects within commented-out code (CO code). AI coding assistants' interpretation of CO code in prompts affects the code they generate. This study evaluates how AI coding assistants, GitHub Copilot and Cursor, are influenced by defective CO code. The experimental results show that defective CO code in the context causes AI coding assistants to generate more defective code, reaching up to 58.17 percent. Our findings further demonstrate that the tools do not simply copy the defective code from the context. Instead, they actively reason to complete incomplete defect patterns and continue to produce defective code despite distractions such as incorrect indentation or tags. Even with explicit instructions to ignore the defective CO code, the reduction in defects does not exceed 21.84 percent. These findings underscore the need for improved robustness and security measures in AI coding assistants.
翻译:随着大型语言模型在代码生成领域的快速发展,GitHub Copilot和Cursor等AI驱动的编辑器正在彻底改变软件开发实践。与此同时,研究已发现生成代码中存在的潜在缺陷。先前的研究主要考察了代码上下文如何影响缺陷代码的生成,却常常忽略了注释代码(CO代码)中缺陷的影响。AI编程助手对提示中注释代码的解读会影响其生成的代码。本研究评估了GitHub Copilot和Cursor这两种AI编程助手如何受有缺陷的注释代码影响。实验结果表明,上下文中的有缺陷注释代码会导致AI编程助手生成更多有缺陷的代码,比例最高可达58.17%。我们的发现进一步表明,这些工具并非简单地复制上下文中的缺陷代码,而是主动推理以补全不完整的缺陷模式,并且即使在存在错误缩进或标签等干扰因素的情况下,仍会持续生成缺陷代码。即使给出明确指令要求忽略有缺陷的注释代码,缺陷减少率也不超过21.84%。这些发现凸显了AI编程助手在鲁棒性和安全措施方面亟待改进的必要性。