With recent advances in large language models (LLMs), this paper explores the potential of leveraging state-of-the-art LLMs,such as GPT-4, to transfer existing human-written properties (e.g.,those from Certora auditing reports) and automatically generate customized properties for unknown code. To this end, we embed existing properties into a vector database and retrieve a reference property for LLM-based in-context learning to generate a new property for a given code. While this basic process is relatively straightforward, ensuring that the generated properties are (i) compilable, (ii) appropriate, and (iii) verifiable presents challenges. To address (i), we use the compilation and static analysis feedback as an external oracle to guide LLMs in iteratively revising the generated properties. For (ii), we consider multiple dimensions of similarity to rank the properties and employ a weighted algorithm to identify the top-K properties as the final result. For (iii), we design a dedicated prover to formally verify the correctness of the generated properties. We have implemented these strategies into a novel LLM-based property generation tool called PropertyGPT. Our experiments show that PropertyGPT can generate comprehensive and high-quality properties, achieving an 80% recall compared to the ground truth. It successfully detected 26 CVEs/attack incidents out of 37 tested and also uncovered 12 zero-day vulnerabilities, leading to $8,256 in bug bounty rewards.
翻译:随着大型语言模型(LLM)的最新进展,本文探索利用GPT-4等前沿LLM转化现有人工编写属性(例如来自Certora审计报告中的属性)并为未知代码自动生成定制化属性的潜力。为此,我们将现有属性嵌入向量数据库,通过基于LLM的上下文学习检索参考属性,从而为给定代码生成新属性。尽管这一基础流程相对直接,但确保生成的属性满足(i)可编译性、(ii)适用性与(iii)可验证性仍存在挑战。针对(i),我们利用编译和静态分析反馈作为外部引导机制,驱动LLM迭代修正生成的属性。对于(ii),我们通过多维度相似性度量对属性进行排序,并采用加权算法确定前K个属性作为最终结果。关于(iii),我们设计了专用证明器对生成属性的正确性进行形式化验证。我们将这些策略实现为名为PropertyGPT的新型LLM属性生成工具。实验表明,PropertyGPT能够生成全面且高质量的属性,相较于基准真值达到80%的召回率。该工具成功检测出37个测试案例中的26个CVE/攻击事件,并发现了12个零日漏洞,累计获得8,256美元漏洞赏金。