API misuse in code generated by large language models (LLMs) represents a serious emerging challenge in software development. While LLMs have demonstrated impressive code generation capabilities, their interactions with complex library APIs remain highly prone to errors, potentially leading to software failures and security vulnerabilities. This paper presents the first comprehensive study of API misuse patterns in LLM-generated code, analyzing both method selection and parameter usage across Python and Java. Through extensive manual annotation of 3,892 method-level and 2,560 parameter-level misuses, we develop a novel taxonomy of four distinct API misuse types specific to LLMs, which significantly differ from traditional human-centric misuse patterns. Our evaluation of two widely used LLMs, StarCoder-7B (open-source) and Copilot (closed-source), reveals significant challenges in API usage, particularly in areas of hallucination and intent misalignment. We propose Dr.Fix, a novel LLM-based automatic program repair approach for API misuse based on the aforementioned taxonomy. Our method substantially improves repair accuracy for real-world API misuse, demonstrated by increases of up to 38.4 points in BLEU scores and 40 percentage points in exact match rates across different models and programming languages. This work provides crucial insights into the limitations of current LLMs in API usage and presents an effective solution for the automated repair of API misuse in LLM-generated code.
翻译:大型语言模型(LLM)生成代码中的API误用是软件开发领域一个严峻的新兴挑战。尽管LLM已展现出卓越的代码生成能力,但其与复杂库API的交互仍极易出错,可能导致软件故障与安全漏洞。本文首次对LLM生成代码中的API误用模式展开系统性研究,分析了Python与Java语言中的方法选择与参数使用问题。通过对3,892个方法层级与2,560个参数层级误用例进行大规模人工标注,我们构建了专门针对LLM的四类API误用新型分类体系,该体系与传统以人为中心的误用模式存在显著差异。我们对两种广泛使用的LLM——StarCoder-7B(开源)与Copilot(闭源)的评估显示,其在API使用方面存在显著挑战,尤其在幻觉与意图错位领域。基于上述分类体系,我们提出Dr.Fix——一种基于LLM的API误用自动程序修复新方法。该方法显著提升了实际场景中API误用的修复准确率,在不同模型与编程语言中,BLEU分数最高提升38.4分,完全匹配率提升达40个百分点。本研究为当前LLM在API使用方面的局限性提供了关键洞见,并为LLM生成代码中的API误用自动化修复提出了有效解决方案。