Large language models (LLMs) have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as context to prompt LLMs. However, developmental requirements can be coarse-grained, requiring a combination of multiple fine-grained APIs. This granularity inconsistency makes API recommendation a challenging task. To address this, we propose CAPIR (Compositional API Recommendation), which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down a coarse-grained task description into several detailed subtasks. Then, CAPIR applies an embedding-based Retriever to identify relevant APIs corresponding to each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out redundant APIs and provides the final recommendation. To facilitate the evaluation of API recommendation methods on coarse-grained requirements, we present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation). Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines. Specifically, on RAPID's Torchdata-AR dataset, compared to the state-of-the-art API recommendation approach, CAPIR improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On LOCG's Torchdata-Code dataset, compared to code generation without API recommendation, CAPIR improves pass@100 from 16.0% to 28.0%.
翻译:大型语言模型(LLMs)在代码生成领域取得了卓越性能,但在面向库的代码生成中(尤其是针对训练数据中未出现的库)表现仍不理想。已有研究通过API推荐技术协助LLMs使用库:该方法检索与用户需求相关的API,并将其作为上下文提示LLMs。然而,开发需求可能是粗粒度的,需要组合多种细粒度API。这种粒度不匹配使得API推荐成为一项具有挑战性的任务。为此,我们提出CAPIR(组合式API推荐),采用"分而治之"策略为粗粒度需求推荐API。具体而言,CAPIR利用基于LLM的分解器将粗粒度任务描述拆解为多个详细子任务,然后通过基于嵌入的检索器识别与每个子任务相关的API。此外,CAPIR借助基于LLM的重排序器过滤冗余API,并生成最终推荐。为促进粗粒度需求下API推荐方法的评估,我们构建了两个具有挑战性的基准测试集RAPID(基于文档的API推荐)和LOCG(面向库的代码生成)。在这些基准上的实验结果表明,CAPIR相较于现有基线方法具有显著有效性。具体而言,在RAPID的Torchdata-AR数据集上,CAPIR将召回率@5从18.7%提升至43.2%,精确率@5从15.5%提升至37.1%;在LOCG的Torchdata-Code数据集上,相较于不使用API推荐的代码生成,CAPIR将pass@100从16.0%提升至28.0%。