APICom: Automatic API Completion via Prompt Learning and Adversarial Training-based Data Augmentation

Based on developer needs and usage scenarios, API (Application Programming Interface) recommendation is the process of assisting developers in finding the required API among numerous candidate APIs. Previous studies mainly modeled API recommendation as the recommendation task, which can recommend multiple candidate APIs for the given query, and developers may not yet be able to find what they need. Motivated by the neural machine translation research domain, we can model this problem as the generation task, which aims to directly generate the required API for the developer query. After our preliminary investigation, we find the performance of this intuitive approach is not promising. The reason is that there exists an error when generating the prefixes of the API. However, developers may know certain API prefix information during actual development in most cases. Therefore, we model this problem as the automatic completion task and propose a novel approach APICom based on prompt learning, which can generate API related to the query according to the prompts (i.e., API prefix information). Moreover, the effectiveness of APICom highly depends on the quality of the training dataset. In this study, we further design a novel gradient-based adversarial training method {\atpart} for data augmentation, which can improve the normalized stability when generating adversarial examples. To evaluate the effectiveness of APICom, we consider a corpus of 33k developer queries and corresponding APIs. Compared with the state-of-the-art baselines, our experimental results show that APICom can outperform all baselines by at least 40.02\%, 13.20\%, and 16.31\% in terms of the performance measures EM@1, MRR, and MAP. Finally, our ablation studies confirm the effectiveness of our component setting (such as our designed adversarial training method, our used pre-trained model, and prompt learning) in APICom.

翻译：基于开发者需求与使用场景，API（应用程序编程接口）推荐旨在协助开发者在众多候选接口中定位目标API。既往研究主要将此问题建模为推荐任务，即针对给定查询推荐多个候选API，但这可能仍无法满足开发者的精准需求。受神经机器翻译领域启发，我们可将此问题建模为生成任务——直接为开发者查询生成所需API。经初步探究发现，这类直观方法的性能并不理想，其原因在于API前缀生成过程中存在偏差。然而实际开发中，开发者往往掌握部分API前缀信息。因此，我们将此问题重新定义为自动化补全任务，提出基于提示学习的新方法APICom，该方法可根据提示（即API前缀信息）生成与查询相关的API。此外，APICom的有效性高度依赖训练数据集质量。本研究进一步设计了基于梯度的对抗训练方法{\atpart}用于数据增强，该技术可提升对抗样本生成时的归一化稳定性。为评估APICom性能，我们构建了包含33K组开发者查询及其对应API的语料库。与现有最优基线方法相比，实验结果表明APICom在EM@1、MRR和MAP三项评估指标上分别至少提升40.02%、13.20%和16.31%。消融实验最终验证了APICom中各组件设置（包括所设计的对抗训练方法、预训练模型及提示学习）的有效性。