Automating C-to-Rust migration is critical for improving software security without sacrificing performance. Traditional rule-based methods struggle with diverse C idioms, often producing rigid and unidiomatic Rust code. Large Language Models (LLMs), trained on massive code corpora, offer a promising alternative by leveraging cross-language generalization to generate more idiomatic and maintainable Rust code. However, several challenges remain. First, existing LLM-based approaches fail to handle cross-file dependencies effectively, either ignoring them or including entire files as context, which limits accurate dependency modeling. Second, complex dependencies and structured inputs and outputs make it difficult to verify syntactic correctness and functional equivalence at the repository level. Third, the lack of large-scale C-Rust parallel data constrains model performance. We propose DepTrans, a framework that combines model capability enhancement with structured inference. DepTrans introduces Reinforcement-Aligned Syntax Training to improve generation quality through multi-task fine-tuning and feedback-driven reinforcement learning. It further applies Dependency-Guided Iterative Refinement to capture fine-grained cross-file dependencies and iteratively refine generated Rust code. We construct a dataset of 85k training samples and a benchmark of 145 repository-level instances. Experiments show that DepTrans achieves a 60.7 percent compilation success rate and 43.5 percent computational accuracy, outperforming the strongest baseline by 22.8 and 17.3 percentage points. It also successfully builds 7 of 15 industrial C projects, demonstrating its practical potential.
翻译:[translated abstract in Chinese]
自动化C到Rust的迁移对于在不牺牲性能的前提下提升软件安全性至关重要。传统基于规则的方法难以应对多样化的C语言惯用法,常常生成僵化且不符合Rust惯用法的代码。基于海量代码语料库训练的大语言模型通过跨语言泛化能力提供了一种有前景的替代方案,能够生成更符合惯用法且更易维护的Rust代码。然而,仍存在若干挑战:首先,现有基于大语言模型的方法未能有效处理跨文件依赖关系,要么忽略此类依赖,要么将整个文件作为上下文纳入,这限制了依赖建模的准确性;其次,复杂的依赖关系以及结构化的输入输出使得在仓库级别验证语法正确性和功能等价性变得困难;第三,缺乏大规模C-Rust平行数据制约了模型性能。我们提出DepTrans框架,该框架将模型能力增强与结构化推理相结合。DepTrans引入强化对齐的语法训练,通过多任务微调和基于反馈的强化学习提升生成质量;进一步应用依赖引导的迭代精炼以捕获细粒度的跨文件依赖关系,并迭代优化生成的Rust代码。我们构建了包含85k个训练样本的数据集和包含145个仓库级实例的基准测试。实验表明,DepTrans实现了60.7%的编译成功率和43.5%的计算正确率,分别比最强基线高出22.8和17.3个百分点。此外,它成功构建了15个工业级C项目中的7个,展示了其实用潜力。