Translating legacy C codebases to Rust is increasingly demanded for building safety-critical systems. While various approaches have emerged for this task, they face inherent trade-offs: rule-based methods often struggle to satisfy code safety and idiomaticity requirements, while LLM-based methods frequently fail to generate semantically equivalent Rust code, due to the heavy dependencies of modules across the entire codebase. Recent studies have revealed that both solutions are limited to small-scale programs. In this paper, we propose EvoC2Rust, an automated framework for converting complete C projects to equivalent Rust ones. EvoC2Rust employs a skeleton-guided translation strategy for project-level translation. The pipeline consists of three stages: 1) it first decomposes the C project into functional modules, employs a feature-mapping-enhanced LLM to transform definitions and macros, and generates type-checked function stubs, which form a compilable Rust skeleton; 2) it then incrementally translates functions, replacing the corresponding stub placeholders; 3) finally, it repairs compilation errors by integrating LLM and static analysis. Through evolutionary augmentation, EvoC2Rust combines the advantages of both rule-based and LLM-based solutions. Our evaluation on open-source benchmarks and six industrial projects demonstrates the superior performance of EvoC2Rust in project-level C-to-Rust translation. The results show that our approach outperforms the strongest LLM-based baseline by 17.24% in syntax accuracy and 14.32% in semantic accuracy, while also achieving a 43.59% higher code safety rate than the best rule-based tool.
翻译:将遗留C代码库迁移至Rust对于构建安全关键型系统的需求日益增长。尽管已有多种方法应对此任务,但它们面临固有的权衡:基于规则的方法往往难以满足代码安全性与惯用性要求,而基于大语言模型(LLM)的方法由于整个代码库中模块间的重度依赖,经常无法生成语义等效的Rust代码。近期研究表明,这两种方案均局限于小规模程序。本文提出EvoC2Rust,一种将完整C项目转换为等效Rust项目的自动化框架。EvoC2Rust采用骨架引导的翻译策略实现项目级转换。其流程包含三个阶段:1)首先将C项目分解为功能模块,采用特征映射增强的LLM转换定义与宏,并生成经过类型检查的函数存根,从而构成可编译的Rust骨架;2)随后增量翻译函数,替换对应的存根占位符;3)最终通过集成LLM与静态分析修复编译错误。通过演化增强机制,EvoC2Rust融合了基于规则与基于LLM方案的双重优势。我们在开源基准测试与六个工业项目上的评估表明,EvoC2Rust在项目级C到Rust转换中具有卓越性能。结果显示,本方法在语法准确率上超越最强的基于LLM基线17.24%,在语义准确率上领先14.32%,同时相较最佳基于规则工具实现了43.59%的代码安全率提升。