The introduction of large language models (LLMs) like ChatGPT and Google Palm2 for smart contract generation seems to be the first well-established instance of an AI pair programmer. LLMs have access to a large number of open-source smart contracts, enabling them to utilize more extensive code in Solidity than other code generation tools. Although the initial and informal assessments of LLMs for smart contract generation are promising, a systematic evaluation is needed to explore the limits and benefits of these models. The main objective of this study is to assess the quality of generated code provided by LLMs for smart contracts. We also aim to evaluate the impact of the quality and variety of input parameters fed to LLMs. To achieve this aim, we created an experimental setup for evaluating the generated code in terms of validity, correctness, and efficiency. Our study finds crucial evidence of security bugs getting introduced in the generated smart contracts as well as the overall quality and correctness of the code getting impacted. However, we also identified the areas where it can be improved. The paper also proposes several potential research directions to improve the process, quality and safety of generated smart contract codes.
翻译:像ChatGPT和Google Palm2这样的大语言模型(LLMs)的引入,似乎是AI结对编程的首个成熟实例。LLMs能够访问大量开源智能合约,从而在Solidity代码生成中比其他代码生成工具利用更广泛的代码库。尽管对LLMs生成智能合约的初步和非正式评估令人鼓舞,但仍需系统性评估以探索这些模型的局限与优势。本研究的主要目标是评估LLMs为智能合约生成的代码质量。我们同时旨在评估输入参数的质量与多样性对LLMs的影响。为此,我们设计了一个实验框架,从有效性、正确性和效率三个维度评估生成的代码。研究发现,生成的智能合约中不仅引入了安全漏洞,而且代码的整体质量和正确性也受到了影响。然而,我们也识别出可改进的领域。本文还提出了若干潜在研究方向,以提升智能合约生成代码的过程、质量和安全性。