Explaining Software Bugs Leveraging Code Structures in Neural Machine Translation

Software bugs claim approximately 50% of development time and cost the global economy billions of dollars. Once a bug is reported, the assigned developer attempts to identify and understand the source code responsible for the bug and then corrects the code. Over the last five decades, there has been significant research on automatically finding or correcting software bugs. However, there has been little research on automatically explaining the bugs to the developers, which is essential but a highly challenging task. In this paper, we propose Bugsplainer, a transformer-based generative model, that generates natural language explanations for software bugs by learning from a large corpus of bug-fix commits. Bugsplainer can leverage structural information and buggy patterns from the source code to generate an explanation for a bug. Our evaluation using three performance metrics shows that Bugsplainer can generate understandable and good explanations according to Google's standard, and can outperform multiple baselines from the literature. We also conduct a developer study involving 20 participants where the explanations from Bugsplainer were found to be more accurate, more precise, more concise and more useful than the baselines.

翻译：软件缺陷占据了约50%的开发时间，并给全球经济造成数十亿美元的损失。当一个缺陷被报告后，分配的开发者需要定位并理解引发缺陷的源代码，然后修正代码。在过去的五十年里，自动查找或修正软件缺陷的研究已取得显著进展。然而，自动向开发者解释缺陷的研究却十分匮乏——这一工作至关重要但极具挑战性。本文提出Bugsplainer，一种基于Transformer的生成模型，通过从大规模缺陷修复提交数据中学习，为软件缺陷生成自然语言解释。Bugsplainer能够利用源代码中的结构信息和缺陷模式生成缺陷解释。我们采用三项性能指标的评估表明，Bugsplainer能生成符合谷歌标准、可理解且质量良好的解释，并优于文献中的多个基线方法。此外，我们开展了一项包含20名参与者的开发者实验，结果表明Bugsplainer生成的解释在准确性、精确性、简洁性和实用性方面均优于基线方法。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日