This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing LLMs to dig out vulnerabilities within smart contracts based on our ongoing research. For the smart contract vulnerability detection task, the key to achieving practical usability lies in detecting as many true vulnerabilities as possible while minimizing the number of false positives. However, our empirical study using LLM as a detection tool reveals interesting yet contradictory findings: generating more answers with higher randomness largely increases the likelihood of a correct answer being generated while inevitably leading to a higher number of false positives, resulting in exhaustive manual verification efforts. To mitigate this tension, we propose an adversarial framework dubbed GPTLens that breaks the traditional one-stage detection into two synergistic stages $-$ generation and discrimination, for progressive detection and fine-tuning, wherein the LLM plays dual roles, i.e., auditor and critic, respectively. The goal of auditor is to identify multiple diverse vulnerabilities with intermediate reasoning, while the goal of critic is to evaluate the accuracy of identified vulnerabilities and to examine the integrity of the detection reasoning. Experimental results and illustrative examples demonstrate that auditor and critic work together harmoniously to yield significant improvements over the traditional one-stage detection. GPTLens is intuitive, strategic, and entirely LLM-driven without relying on specialist expertise in smart contracts, showcasing its methodical generality and potential to detect a broad spectrum of vulnerabilities. Our code is available at: https://github.com/git-disl/GPTLens.
翻译:本文基于我们正在进行的研究,系统分析了利用大语言模型挖掘智能合约漏洞的机遇、挑战及潜在解决方案。在智能合约漏洞检测任务中,实现实际可用性的关键在于:在尽可能检测出真实漏洞的同时,最大限度减少误报数量。然而,我们通过将大语言模型作为检测工具进行的实证研究揭示了有趣且相互矛盾的发现:以更高随机性生成更多答案虽能大幅提升正确答案出现的可能性,但不可避免地会导致更多误报,从而带来繁琐的人工验证负担。为缓解这一矛盾,我们提出了名为GPTLens的对抗性框架,该框架将传统单阶段检测分解为生成与判别这两个协同阶段,实现渐进式检测与微调——其中大语言模型分别扮演审计者与评判者的双重角色。审计者的目标是识别出多种具备中间推理过程的差异化漏洞,而评判者的目标则是评估已识别漏洞的准确性并检验检测推理的完备性。实验结果与实例证明,审计者与评判者能协同运作,相较于传统单阶段检测带来显著提升。GPTLens具有直观性、策略性,且完全由大语言模型驱动,无需智能合约领域的专家知识,展现出方法论的通用性及检测广泛漏洞的潜力。我们的代码已开源:https://github.com/git-disl/GPTLens。