The introduction of large language models (LLMs) like ChatGPT and Google Palm2 for smart contract generation seems to be the first well-established instance of an AI pair programmer. LLMs have access to a large number of open-source smart contracts, enabling them to utilize more extensive code in Solidity than other code generation tools. Although the initial and informal assessments of LLMs for smart contract generation are promising, a systematic evaluation is needed to explore the limits and benefits of these models. The main objective of this study is to assess the quality of generated code provided by LLMs for smart contracts. We also aim to evaluate the impact of the quality and variety of input parameters fed to LLMs. To achieve this aim, we created an experimental setup for evaluating the generated code in terms of validity, correctness, and efficiency. Our study finds crucial evidence of security bugs getting introduced in the generated smart contracts as well as the overall quality and correctness of the code getting impacted. However, we also identified the areas where it can be improved. The paper also proposes several potential research directions to improve the process, quality and safety of generated smart contract codes.
翻译:大型语言模型(如ChatGPT和Google Palm2)在智能合约生成领域的应用,似乎标志着AI结对编程首次得到实质性验证。相较于其他代码生成工具,LLMs能够访问海量开源智能合约,从而在Solidity语言上调用更广泛的代码库。尽管对LLMs用于智能合约生成的初步非正式评估结果令人期待,但仍需系统性研究以探索这些模型的性能边界与优势。本研究的主要目标是评估LLMs为智能合约生成的代码质量,同时考察输入参数质量与多样性对生成结果的影响。为实现该目标,我们构建了实验框架,从有效性、正确性和效率三个维度评估生成的代码。研究发现关键证据表明:生成的智能合约中引入了安全漏洞,同时代码的整体质量与正确性也受到影响。然而,我们也识别出可改进的领域。本文还提出了若干潜在研究方向,以优化智能合约生成代码的流程、质量与安全性。