Due to the immutable and decentralized nature of Ethereum (ETH) platform, smart contracts are prone to security risks that can result in financial loss. While existing machine learning-based vulnerability detection algorithms achieve high accuracy at the contract level, they require developers to manually inspect source code to locate bugs. To this end, we present G-Scan, the first end-to-end fine-grained line-level vulnerability detection system evaluated on the first-of-its-kind real world dataset. G-Scan first converts smart contracts to code graphs in a dependency and hierarchy preserving manner. Next, we train a graph neural network to identify vulnerable nodes and assess security risks. Finally, the code graphs with node vulnerability predictions are mapped back to the smart contracts for line-level localization. We train and evaluate G-Scan on a collected real world smart contracts dataset with line-level annotations on reentrancy vulnerability, one of the most common and severe types of smart contract vulnerabilities. With the well-designed graph representation and high-quality dataset, G-Scan achieves 93.02% F1-score in contract-level vulnerability detection and 93.69% F1-score in line-level vulnerability localization. Additionally, the lightweight graph neural network enables G-Scan to localize vulnerabilities in 6.1k lines of code smart contract within 1.2 seconds.
翻译:由于以太坊(ETH)平台具有不可篡改性与去中心化特性,智能合约极易产生导致财务损失的安全风险。现有基于机器学习的漏洞检测算法虽能在合约级实现高精度,但要求开发者人工审查源代码以定位缺陷。为此,我们提出G-Scan——首个端到端细粒度行级漏洞检测系统,并在首个真实世界数据集上完成评估。G-Scan首先以依赖与层级保持的方式将智能合约转换为代码图,随后训练图神经网络识别易受攻击节点并评估安全风险,最后将包含节点漏洞预测的代码图映射回智能合约以实现行级定位。我们基于收集的带有重入漏洞逐行标注的真实世界智能合约数据集(重入漏洞是最常见且危害最严重的智能合约漏洞类型之一)对G-Scan进行训练与评估。凭借精心设计的图表示与高质量数据集,G-Scan在合约级漏洞检测中取得93.02%的F1分数,在行级漏洞定位中达到93.69%的F1分数。此外,轻量级图神经网络使G-Scan能在1.2秒内定位含6.1千行代码智能合约中的漏洞。