LecPrompt: A Prompt-based Approach for Logical Error Correction with CodeBERT

Logical errors in programming don't raise compiler alerts, making them hard to detect. These silent errors can disrupt a program's function or cause run-time issues. Their correction requires deep insight into the program's logic, highlighting the importance of automated detection and repair. In this paper, we introduce LecPrompt to localize and repair logical errors, an prompt-based approach that harnesses the capabilities of CodeBERT, a transformer-based large language model trained on code. First, LecPrompt leverages a large language model to calculate perplexity and log probability metrics, pinpointing logical errors at both token and line levels. Through statistical analysis, it identifies tokens and lines that deviate significantly from the expected patterns recognized by large language models, marking them as potential error sources. Second, by framing the logical error correction challenge as a Masked Language Modeling (MLM) task, LecPrompt employs CodeBERT to autoregressively repair the identified error tokens. Finally, the soft-prompt method provides a novel solution in low-cost scenarios, ensuring that the model can be fine-tuned to the specific nuances of the logical error correction task without incurring high computational costs. To evaluate LecPrompt's performance, we created a method to introduce logical errors into correct code and applying this on QuixBugs to produce the QuixBugs-LE dataset. Our evaluations on the QuixBugs-LE dataset for both Python and Java highlight the impressive capabilities of our method, LecPrompt. For Python, LecPrompt achieves a noteworthy 74.58% top-1 token-level repair accuracy and 27.4% program-level repair accuracy. In Java, LecPrompt delivers a 69.23\% top-1 token-level repair accuracy and 24.7% full program-level repair accuracy.

翻译：编程中的逻辑错误不会引发编译器警报，因此难以检测。这些隐性错误可能破坏程序功能或导致运行时问题。修正此类错误需要对程序逻辑有深入理解，这凸显了自动化检测与修复的重要性。本文提出LecPrompt，一种基于提示的方法，用于定位和修复逻辑错误。该方法利用了CodeBERT（一种基于Transformer、在代码上训练的大语言模型）的能力。首先，LecPrompt利用大语言模型计算困惑度和对数概率指标，在标记级别和行级别精确定位逻辑错误。通过统计分析，它识别出与大语言模型识别的预期模式显著偏离的标记和行，并将其标记为潜在错误源。其次，通过将逻辑错误修正任务构建为掩码语言建模任务，LecPrompt采用CodeBERT对识别出的错误标记进行自回归修复。最后，软提示方法为低成本场景提供了一种新颖解决方案，确保模型能够针对逻辑错误修正任务的特定细微差别进行微调，而无需承担高昂的计算成本。为评估LecPrompt的性能，我们创建了一种将逻辑错误引入正确代码的方法，并将其应用于QuixBugs以生成QuixBugs-LE数据集。我们在Python和Java的QuixBugs-LE数据集上的评估结果凸显了LecPrompt方法的卓越能力。对于Python，LecPrompt在标记级别修复准确率（top-1）达到74.58%，程序级别修复准确率为27.4%。对于Java，LecPrompt实现了69.23%的标记级别修复准确率（top-1）和24.7%的完整程序级别修复准确率。