Usually, programming languages have official documentation to guide developers with APIs, methods, and classes. However, researchers identified insufficient or inadequate documentation examples and flaws with the API's complex structure as barriers to learning an API. As a result, developers may consult other sources (StackOverflow, GitHub, etc.) to learn more about an API. Recent research studies have shown that unofficial documentation is a valuable source of information for generating code summaries. We, therefore, have been motivated to leverage such a type of documentation along with deep learning techniques towards generating high-quality summaries for APIs discussed in informal documentation. This paper proposes an automatic approach using the BART algorithm, a state-of-the-art transformer model, to generate summaries for APIs discussed in StackOverflow. We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics which are the most widely used evaluation metrics in text summarization. Furthermore, we evaluated our summaries empirically against a previous work in terms of quality. Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision, %66 for Recall, and %61 for F-measure, and it runs 4.4 times faster.
翻译:通常,编程语言都有官方文档来指导开发者使用API、方法和类。然而,研究人员发现,文档中示例不足或不当、以及API复杂结构带来的缺陷,是学习API的障碍。因此,开发者可能会查阅其他来源(如StackOverflow、GitHub等)以获取更多关于API的信息。近年研究表明,非官方文档是生成代码摘要的重要信息源。这促使我们利用此类文档并结合深度学习技术,为在非正式文档中讨论的API生成高质量摘要。本文提出一种基于BART算法(一种先进的Transformer模型)的自动化方法,用于为StackOverflow上讨论的API生成摘要。我们构建了一个由人工生成摘要构成的标准参考集,并使用文本摘要领域最广泛使用的评估指标ROUGE和BLEU来评估我们的方法。此外,我们从质量角度将我们的摘要与先前工作进行实证比较。研究结果表明,使用深度学习算法能够提高摘要质量,在精确率、召回率和F-measure上平均分别比先前工作提升57%、66%和61%,且运行速度快4.4倍。