Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text, which may neglect significant structural or semantic details. Additionally, most current methods of representing source code focus solely on the code, without considering beneficial additional context. This paper explores the integration of static analysis and additional context such as bug reports and design patterns into source code representations for deep learning models. We use the Abstract Syntax Tree-based Neural Network (ASTNN) method and augment it with additional context information obtained from bug reports and design patterns, creating an enriched source code representation that significantly enhances the performance of common software engineering tasks such as code classification and code clone detection. Utilizing existing open-source code data, our approach improves the representation and processing of source code, thereby improving task performance.
翻译:深度学习技术被广泛应用于代码分类、代码摘要和缺陷检测等程序分析任务中,并引起了广泛关注。然而,传统方法将编程源代码视为自然语言文本,这可能忽略了重要的结构或语义细节。此外,当前大多数源代码表示方法仅专注于代码本身,而未考虑有益的额外上下文信息。本文探索了将静态分析和额外上下文(如缺陷报告与设计模式)融入基于深度学习的源代码表示中。我们采用基于抽象语法树的神经网络(ASTNN)方法,并利用从缺陷报告和设计模式中获取的额外上下文信息对其进行增强,从而构建出丰富的源代码表示,显著提升了代码分类和代码克隆检测等常见软件工程任务的性能。利用现有的开源代码数据,我们的方法改进了源代码的表示与处理过程,进而提升了任务性能。