WitheredLeaf: Finding Entity-Inconsistency Bugs with LLMs

Originating from semantic bugs, Entity-Inconsistency Bugs (EIBs) involve misuse of syntactically valid yet incorrect program entities, such as variable identifiers and function names, which often have security implications. Unlike straightforward syntactic vulnerabilities, EIBs are subtle and can remain undetected for years. Traditional detection methods, such as static analysis and dynamic testing, often fall short due to the versatile and context-dependent nature of EIBs. However, with advancements in Large Language Models (LLMs) like GPT-4, we believe LLM-powered automatic EIB detection becomes increasingly feasible through these models' semantics understanding abilities. This research first undertakes a systematic measurement of LLMs' capabilities in detecting EIBs, revealing that GPT-4, while promising, shows limited recall and precision that hinder its practical application. The primary problem lies in the model's tendency to focus on irrelevant code snippets devoid of EIBs. To address this, we introduce a novel, cascaded EIB detection system named WitheredLeaf, which leverages smaller, code-specific language models to filter out most negative cases and mitigate the problem, thereby significantly enhancing the overall precision and recall. We evaluated WitheredLeaf on 154 Python and C GitHub repositories, each with over 1,000 stars, identifying 123 new flaws, 45% of which can be exploited to disrupt the program's normal operations. Out of 69 submitted fixes, 27 have been successfully merged.

翻译：实体不一致性缺陷（Entity-Inconsistency Bugs, EIBs）源于语义错误，涉及对语法正确但语义错误的程序实体（如变量标识符和函数名）的误用，此类缺陷往往具有安全影响。与直白的语法漏洞不同，EIBs隐蔽性强，可能多年未被发现。由于EIBs的多样性和上下文依赖性，传统检测方法（如静态分析和动态测试）常常力不从心。然而，随着GPT-4等大语言模型（LLMs）的发展，我们认为基于LLM的自动化EIB检测将因这些模型的语义理解能力而日益可行。本研究首先系统性地评估了LLMs检测EIBs的能力，发现GPT-4虽具潜力，但其较低的召回率和精确率阻碍了实际应用。主要问题在于模型倾向于关注不含EIBs的无关代码片段。为解决此问题，我们提出了一种名为WitheredLeaf的新型级联EIB检测系统，该系统利用轻量级代码专用语言模型过滤大多数负例并缓解上述问题，从而显著提升整体精确率和召回率。我们在154个Python和C语言GitHub仓库（每个仓库星标数超过1000）上对WitheredLeaf进行了评估，发现了123个新缺陷，其中45%可被利用以破坏程序的正常运行。在提交的69个修复方案中，已有27个被成功合并。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日