Smart contract code summarization is crucial for efficient maintenance and vulnerability mitigation. While many studies use Large Language Models (LLMs) for summarization, their performance still falls short compared to fine-tuned models like CodeT5+ and CodeBERT. Some approaches combine LLMs with data flow analysis but fail to fully capture the hierarchy and control structures of the code, leading to information loss and degraded summarization quality. We propose SCLA, an LLM-based method that enhances summarization by integrating a Control Flow Graph (CFG) and semantic facts from the code's control flow into a semantically enriched prompt. SCLA uses a control flow extraction algorithm to derive control flows from semantic nodes in the Abstract Syntax Tree (AST) and constructs the corresponding CFG. Code semantic facts refer to both explicit and implicit information within the AST that is relevant to smart contracts. This method enables LLMs to better capture the structural and contextual dependencies of the code. We validate the effectiveness of SCLA through comprehensive experiments on a dataset of 40,000 real-world smart contracts. The experiment shows that SCLA significantly improves summarization quality, outperforming the SOTA baselines with improvements of 26.7%, 23.2%, 16.7%, and 14.7% in BLEU-4, METEOR, ROUGE-L, and BLEURT scores, respectively.
翻译:智能合约代码摘要对于高效维护与漏洞缓解至关重要。尽管许多研究使用大语言模型进行摘要生成,但其性能仍不及CodeT5+和CodeBERT等微调模型。现有方法虽结合数据流分析,却未能充分捕捉代码的层次结构与控制逻辑,导致信息丢失与摘要质量下降。本文提出SCLA方法,通过将控制流图及代码控制流中的语义事实整合至语义增强提示中,提升大语言模型的摘要能力。SCLA采用控制流提取算法从抽象语法树的语义节点推导控制流,并构建相应控制流图。代码语义事实指抽象语法树中与智能合约相关的显式与隐式信息。该方法使大语言模型能更有效捕捉代码的结构与上下文依赖关系。我们在包含40,000个真实智能合约的数据集上进行了全面实验验证,结果表明SCLA显著提升摘要质量,在BLEU-4、METEOR、ROUGE-L和BLEURT指标上分别超越现有最优基线26.7%、23.2%、16.7%和14.7%。