We present Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation, a two-phase pipeline for translating real-world C projects to safe Rust. Existing approaches either produce unsafe output without memory-safety guarantees or translate functions in isolation, failing to detect cross-unit type mismatches or handle unsafe constructs requiring whole-program reasoning. Furthermore, function-level LLM pipelines require coordinated caller updates when type signatures change, while project-scale systems often fail to produce compilable output under real-world dependency complexity. Encrust addresses these limitations by decoupling boundary adaptation from function logic via an Application Binary Interface (ABI)-preserving wrapper pattern and validating each intermediate state against the integrated codebase. Phase 1 (Encapsulated Substitution) translates each function using an ABI-preserving wrapper that splits it into two components: a caller-transparent shim retaining the original raw-pointer signature, and a safe inner function targeted by the LLM with a clean, scope-limited prompt. This enables independent per-function type changes with automatic rollback on failure, without coordinated caller updates. A deterministic, type-directed wrapper elimination pass then removes wrappers after successful translation. Phase 2 (Agentic Refinement) resolves unsafe constructs beyond per-function scope, including static mut globals, skipped wrapper pairs, and failed translations, using an LLM agent operating on the whole codebase under a baseline-aware verification gate. We evaluate Encrust on 7 GNU Coreutils programs and 8 libraries from the Laertes benchmark, showing substantial unsafe-construct reduction across all 15 programs while maintaining full test-vector correctness.
翻译:摘要:我们提出了一种名为“基于活动骨架的封装替换与代理式精化实现从C到Rust的安全翻译”的两阶段流水线方法,用于将真实世界的C项目翻译为安全的Rust代码。现有方法要么生成缺乏内存安全保证的不安全输出,要么孤立地翻译函数,无法检测跨单元类型不匹配或处理需要全局程序推理的不安全结构。此外,函数级LLM流水线在类型签名变更时需要协调调用者更新,而项目级系统在真实世界的依赖复杂性下往往无法产生可编译输出。Encrust通过以下方式解决这些局限:利用保持应用程序二进制接口(ABI)的封装器模式将边界适配与函数逻辑解耦,并针对集成代码库验证每个中间状态。阶段一(封装替换)使用保持ABI的封装器翻译每个函数,将其拆分为两个组件:一个保留原始原始指针签名的调用透明垫片,以及一个由LLM基于简洁且范围限定的提示生成的安全内部函数。这使得函数级类型更改可独立进行并在失败时自动回滚,无需协调调用者更新。随后,通过确定性的、类型导向的封装器消除步骤,在成功翻译后移除封装器。阶段二(代理式精化)解决超出函数作用域的不安全结构,包括静态可变全局变量、跳过的封装器对以及翻译失败情况,通过一个在基线感知验证门控下对整个代码库操作的LLM代理实现。我们在Laertes基准测试中的7个GNU Coreutils程序和8个库上评估了Encrust,结果显示所有15个程序均显著减少了不安全结构,同时保持了完整的测试向量正确性。