Code linters play a crucial role in developing high-quality software systems by detecting potential problems (e.g., memory leaks) in the source code of systems. Despite their benefits, code linters are often language-specific, focused on certain types of issues, and prone to false positives in the interest of speed. This paper investigates whether large language models can be used to develop a more versatile code linter. Such a linter is expected to be language-independent, cover a variety of issue types, and maintain high speed. To achieve this, we collected a large dataset of code snippets and their associated issues. We then selected a language model and trained two classifiers based on the collected datasets. The first is a binary classifier that detects if the code has issues, and the second is a multi-label classifier that identifies the types of issues. Through extensive experimental studies, we demonstrated that the developed large language model-based linter can achieve an accuracy of 84.9% for the binary classifier and 83.6% for the multi-label classifier.
翻译:代码检查器在开发高质量软件系统中扮演着关键角色,其通过检测系统源代码中的潜在问题(如内存泄漏)来实现这一功能。尽管代码检查器具有诸多优势,但它们通常具有语言特定性,仅关注特定类型的问题,并且为了追求速度而容易产生误报。本文研究了是否可以利用大语言模型开发一种更为通用的代码检查器。此类检查器预期具备语言无关性,能够覆盖多种问题类型,并保持较高的检测速度。为实现这一目标,我们收集了一个包含大量代码片段及其相关问题的数据集。随后,我们选择了一个语言模型,并基于所收集的数据集训练了两个分类器:第一个是用于检测代码是否存在问题的二元分类器,第二个是用于识别问题类型的多标签分类器。通过广泛的实验研究,我们证明了所开发的基于大语言模型的检查器能够实现二元分类器84.9%的准确率和多标签分类器83.6%的准确率。