We present an end-to-end framework for systematic evaluation of LLM-generated smart contracts from natural-language specifications. The system parses contractual text into structured schemas, generates Solidity code, and performs automated quality assessment through compilation and security checks. Using CrewAI-style agent teams with iterative refinement, the pipeline produces structured artifacts with full provenance metadata. Quality is measured across five dimensions, including functional completeness, variable fidelity, state-machine correctness, business-logic fidelity, and code quality aggregated into composite scores. The framework supports paired evaluation against ground-truth implementations, quantifying alignment and identifying systematic error modes such as logic omissions and state transition inconsistencies. This provides a reproducible benchmark for empirical research on smart contract synthesis quality and supports extensions to formal verification and compliance checking.
翻译:我们提出了一种端到端框架,用于对基于自然语言规范生成的LLM智能合约进行系统性评估。该系统将合同文本解析为结构化模式,生成Solidity代码,并通过编译与安全检查执行自动化质量评估。采用CrewAI风格的智能体团队进行迭代优化,该流水线生成带有完整溯源元数据的结构化产出物。质量评估涵盖五个维度:功能完整性、变量保真度、状态机正确性、业务逻辑保真度以及代码质量,最终聚合为综合评分。该框架支持与基准实现进行配对评估,量化对齐程度并识别系统性错误模式(如逻辑遗漏和状态转换不一致)。这为智能合约合成质量的实证研究提供了可复现的基准,并支持扩展至形式化验证与合规性检查。