While having options could be liberating, too many options could lead to the sub-optimal solution being chosen. This is not an exception in the software engineering domain. Nowadays, API has become imperative in making software developers' life easier. APIs help developers implement a function faster and more efficiently. However, given the large number of open-source libraries to choose from, choosing the right APIs is not a simple task. Previous studies on API recommendation leverage natural language (query) to identify which API would be suitable for the given task. However, these studies only consider one source of input, i.e., GitHub or Stack Overflow, independently. There are no existing approaches that utilize Stack Overflow to help generate better API sequence recommendations from queries obtained from GitHub. Therefore, in this study, we aim to provide a framework that could improve the result of the API sequence recommendation by leveraging information from Stack Overflow. In this work, we propose PICASO, which leverages a bi-encoder to do contrastive learning and a cross-encoder to build a classification model in order to find a semantically similar Stack Overflow post given an annotation (i.e., code comment). Subsequently, PICASO then uses the Stack Overflow's title as a query expansion. PICASO then uses the extended queries to fine-tune a CodeBERT, resulting in an API sequence generation model. Based on our experiments, we found that incorporating the Stack Overflow information into CodeBERT would improve the performance of API sequence generation's BLEU-4 score by 10.8%.
翻译:尽管拥有多种选择可能令人解放,但过多的选择也可能导致选择了次优方案。这在软件工程领域亦不例外。如今,API已成为简化软件开发人员工作的重要工具,帮助开发者更快速高效地实现功能。然而,面对海量可供选择的开源库,选择正确的API并非易事。先前关于API推荐的研究利用自然语言(查询)来确定适合给定任务的API,但这些研究仅单独考虑单一来源(例如GitHub或Stack Overflow)的输入。目前尚无现有方法利用Stack Overflow来辅助改进从GitHub获取查询所生成的API序列推荐。因此,本研究旨在提供一个框架,通过利用Stack Overflow中的信息来提升API序列推荐的结果。本文提出PICASO方法,该方法利用双编码器进行对比学习,并采用交叉编码器构建分类模型,从而为给定注释(即代码注释)找到语义相似的Stack Overflow帖子。随后,PICASO将Stack Overflow帖子标题作为查询扩展,并使用扩展后的查询对CodeBERT进行微调,生成API序列模型。实验表明,将Stack Overflow信息整合到CodeBERT中,可使API序列生成的BLEU-4评分提升10.8%。