Compositional API Recommendation for Library-Oriented Code Generation

Large language models (LLMs) have achieved exceptional performance in code generation. However, the performance remains unsatisfactory in generating library-oriented code, especially for the libraries not present in the training data of LLMs. Previous work utilizes API recommendation technology to help LLMs use libraries: it retrieves APIs related to the user requirements, then leverages them as context to prompt LLMs. However, developmental requirements can be coarse-grained, requiring a combination of multiple fine-grained APIs. This granularity inconsistency makes API recommendation a challenging task. To address this, we propose CAPIR (Compositional API Recommendation), which adopts a "divide-and-conquer" strategy to recommend APIs for coarse-grained requirements. Specifically, CAPIR employs an LLM-based Decomposer to break down a coarse-grained task description into several detailed subtasks. Then, CAPIR applies an embedding-based Retriever to identify relevant APIs corresponding to each subtask. Moreover, CAPIR leverages an LLM-based Reranker to filter out redundant APIs and provides the final recommendation. To facilitate the evaluation of API recommendation methods on coarse-grained requirements, we present two challenging benchmarks, RAPID (Recommend APIs based on Documentation) and LOCG (Library-Oriented Code Generation). Experimental results on these benchmarks, demonstrate the effectiveness of CAPIR in comparison to existing baselines. Specifically, on RAPID's Torchdata-AR dataset, compared to the state-of-the-art API recommendation approach, CAPIR improves recall@5 from 18.7% to 43.2% and precision@5 from 15.5% to 37.1%. On LOCG's Torchdata-Code dataset, compared to code generation without API recommendation, CAPIR improves pass@100 from 16.0% to 28.0%.

翻译：大型语言模型（LLMs）在代码生成领域取得了卓越性能，但在面向库的代码生成中（尤其是针对训练数据中未出现的库）表现仍不理想。已有研究通过API推荐技术协助LLMs使用库：该方法检索与用户需求相关的API，并将其作为上下文提示LLMs。然而，开发需求可能是粗粒度的，需要组合多种细粒度API。这种粒度不匹配使得API推荐成为一项具有挑战性的任务。为此，我们提出CAPIR（组合式API推荐），采用"分而治之"策略为粗粒度需求推荐API。具体而言，CAPIR利用基于LLM的分解器将粗粒度任务描述拆解为多个详细子任务，然后通过基于嵌入的检索器识别与每个子任务相关的API。此外，CAPIR借助基于LLM的重排序器过滤冗余API，并生成最终推荐。为促进粗粒度需求下API推荐方法的评估，我们构建了两个具有挑战性的基准测试集RAPID（基于文档的API推荐）和LOCG（面向库的代码生成）。在这些基准上的实验结果表明，CAPIR相较于现有基线方法具有显著有效性。具体而言，在RAPID的Torchdata-AR数据集上，CAPIR将召回率@5从18.7%提升至43.2%，精确率@5从15.5%提升至37.1%；在LOCG的Torchdata-Code数据集上，相较于不使用API推荐的代码生成，CAPIR将pass@100从16.0%提升至28.0%。