In the rapidly evolving world of blockchain systems, the efficient development and maintenance of smart contracts has become a critical task. Smart contract code summarization can significantly facilitate the maintenance of smart contracts and mitigate their vulnerabilities. Large Language Models (LLMs), such as GPT-4o and Gemini-1.5-Pro, possess the capability to generate code summarizations from code examples embedded in prompts. However, the performance of LLMs in code summarization remains suboptimal compared to fine-tuning-based models (e.g., CodeT5+, CodeBERT). Therefore, we propose SCLA, a framework leveraging LLMs and semantic augmentation to improve code summarization performance. SCLA constructs the smart contract's Abstract Syntax Tree (AST) to extract latent semantics, thereby forming a semantically augmented prompt. For evaluation, we utilize a large-scale dataset comprising 40,000 real-world contracts. Experimental results demonstrate that SCLA, with its enhanced prompt, significantly improves the quality of code summarizations. SCLA surpasses other state-of-the-art models (e.g., CodeBERT, CodeT5, and CodeT5+), achieving 37.53% BLEU-4, 52.54% METEOR, 56.97% ROUGE-L, and 63.44% BLEURT, respectively.
翻译:在快速发展的区块链系统领域,智能合约的高效开发与维护已成为一项关键任务。智能合约代码摘要能够显著促进智能合约的维护并降低其脆弱性。诸如GPT-4o和Gemini-1.5-Pro等大语言模型(LLMs)具备根据提示中嵌入的代码示例生成代码摘要的能力。然而,与基于微调的模型(例如CodeT5+、CodeBERT)相比,LLMs在代码摘要方面的性能仍不理想。因此,我们提出了SCLA,一个利用LLMs和语义增强来提高代码摘要性能的框架。SCLA构建智能合约的抽象语法树(AST)以提取潜在语义,从而形成语义增强的提示。为进行评估,我们使用了一个包含40,000个真实世界合约的大规模数据集。实验结果表明,SCLA凭借其增强的提示,显著提高了代码摘要的质量。SCLA超越了其他最先进的模型(例如CodeBERT、CodeT5和CodeT5+),分别取得了37.53%的BLEU-4、52.54%的METEOR、56.97%的ROUGE-L和63.44%的BLEURT分数。