Comments within source code are essential for developers to comprehend the code's purpose and ensure its correct usage. However, as codebases evolve, maintaining an accurate alignment between the comments and the code becomes increasingly challenging. Recognizing the growing interest in automated solutions for detecting and correcting differences between code and its accompanying comments, current methods rely primarily on heuristic rules. In contrast, this paper presents DocChecker, a tool powered by deep learning. DocChecker is adept at identifying inconsistencies between code and comments, and it can also generate synthetic comments. This capability enables the tool to detect and correct instances where comments do not accurately reflect their corresponding code segments. We demonstrate the effectiveness of DocChecker using the Just-In-Time and CodeXGlue datasets in different settings. Particularly, DocChecker achieves a new State-of-the-art result of 72.3% accuracy on the Inconsistency Code-Comment Detection (ICCD) task and 33.64 BLEU-4 on the code summarization task against other Large Language Models (LLMs), even surpassing GPT 3.5 and CodeLlama. DocChecker is accessible for use and evaluation. It can be found on our GitHub https://github.com/FSoft-AI4Code/DocChecker and as an Online Tool http://4.193.50.237:5000/. For a more comprehensive understanding of its functionality, a demonstration video is available on YouTube https://youtu.be/FqnPmd531xw.
翻译:源代码中的注释对于开发者理解代码用途及确保其正确使用至关重要。然而,随着代码库的持续演化,维持代码与注释之间的精确对齐变得日益困难。针对当前检测与修正代码及其注释间差异的自动化方案日益增长的需求,现有方法主要依赖启发式规则。相比之下,本文提出DocChecker——一种基于深度学习的工具。DocChecker擅长识别代码与注释之间的不一致性,并能生成合成注释,从而具备检测并修正注释未能准确反映对应代码片段的能力。我们利用Just-In-Time和CodeXGlue数据集在不同场景下验证了DocChecker的有效性。特别地,在与其他大语言模型(LLM)对比时,DocChecker在不一致性代码-注释检测(ICCD)任务上实现了72.3%准确率的最新技术成果,在代码摘要任务上取得33.64 BLEU-4分数,甚至超越了GPT 3.5和CodeLlama。DocChecker现已开放使用与评估,可通过GitHub仓库https://github.com/FSoft-AI4Code/DocChecker及在线工具http://4.193.50.237:5000/获取。为更全面了解其功能,演示视频可在YouTube平台https://youtu.be/FqnPmd531xw观看。