NATLM: Detecting Defects in NFT Smart Contracts Leveraging LLM

Security issues are becoming increasingly significant with the rapid evolution of Non-fungible Tokens (NFTs). As NFTs are traded as digital assets, they have emerged as prime targets for cyber attackers. In the development of NFT smart contracts, there may exist undiscovered defects that could lead to substantial financial losses if exploited. To tackle this issue, this paper presents a framework called NATLM(NFT Assistant LLM), designed to detect potential defects in NFT smart contracts. The framework effectively identifies four common types of vulnerabilities in NFT smart contracts: ERC-721 Reentrancy, Public Burn, Risky Mutable Proxy, and Unlimited Minting. Relying exclusively on large language models (LLMs) for defect detection can lead to a high false-positive rate. To enhance detection performance, NATLM integrates static analysis with LLMs, specifically Gemini Pro 1.5. Initially, NATLM employs static analysis to extract structural, syntactic, and execution flow information from the code, represented through Abstract Syntax Trees (AST) and Control Flow Graphs (CFG). These extracted features are then combined with vectors of known defect examples to create a matrix for input into the knowledge base. Subsequently, the feature vectors and code vectors of the analyzed contract are compared with the contents of the knowledge base. Finally, the LLM performs deep semantic analysis to enhance detection capabilities, providing a more comprehensive and accurate identification of potential security issues. Experimental results indicate that NATLM analyzed 8,672 collected NFT smart contracts, achieving an overall precision of 87.72%, a recall of 89.58%, and an F1 score of 88.94%. The results outperform other baseline experiments, successfully identifying four common types of defects.

翻译：随着非同质化代币（NFT）的快速发展，其安全问题日益凸显。NFT作为数字资产进行交易，已成为网络攻击者的主要目标。在NFT智能合约的开发过程中，可能存在未被发现的缺陷，一旦被利用可能导致重大经济损失。为解决这一问题，本文提出了一个名为NATLM（NFT Assistant LLM）的框架，旨在检测NFT智能合约中的潜在缺陷。该框架能有效识别NFT智能合约中四种常见漏洞类型：ERC-721重入、公开销毁、风险可变代理和无限铸造。单纯依赖大语言模型（LLM）进行缺陷检测会导致较高的误报率。为提升检测性能，NATLM将静态分析与LLM（具体采用Gemini Pro 1.5）相结合。首先，NATLM运用静态分析从代码中提取结构、语法和执行流信息，并通过抽象语法树（AST）和控制流图（CFG）进行表征。随后，将这些提取的特征与已知缺陷样本的向量组合，构建输入知识库的矩阵。接着，将待分析合约的特征向量和代码向量与知识库内容进行比对。最后，LLM执行深度语义分析以增强检测能力，从而更全面、更准确地识别潜在安全问题。实验结果表明，NATLM分析了8,672个收集的NFT智能合约，整体精确率达到87.72%，召回率为89.58%，F1分数为88.94%。该结果优于其他基线实验，成功识别了四种常见缺陷类型。