ENCRUST: Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation

We present Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation, a two-phase pipeline for translating real-world C projects to safe Rust. Existing approaches either produce unsafe output without memory-safety guarantees or translate functions in isolation, failing to detect cross-unit type mismatches or handle unsafe constructs requiring whole-program reasoning. Furthermore, function-level LLM pipelines require coordinated caller updates when type signatures change, while project-scale systems often fail to produce compilable output under real-world dependency complexity. Encrust addresses these limitations by decoupling boundary adaptation from function logic via an Application Binary Interface (ABI)-preserving wrapper pattern and validating each intermediate state against the integrated codebase. Phase 1 (Encapsulated Substitution) translates each function using an ABI-preserving wrapper that splits it into two components: a caller-transparent shim retaining the original raw-pointer signature, and a safe inner function targeted by the LLM with a clean, scope-limited prompt. This enables independent per-function type changes with automatic rollback on failure, without coordinated caller updates. A deterministic, type-directed wrapper elimination pass then removes wrappers after successful translation. Phase 2 (Agentic Refinement) resolves unsafe constructs beyond per-function scope, including static mut globals, skipped wrapper pairs, and failed translations, using an LLM agent operating on the whole codebase under a baseline-aware verification gate. We evaluate Encrust on 7 GNU Coreutils programs and 8 libraries from the Laertes benchmark, showing substantial unsafe-construct reduction across all 15 programs while maintaining full test-vector correctness.

翻译：摘要：我们提出了一种名为“基于活动骨架的封装替换与代理式精化实现从C到Rust的安全翻译”的两阶段流水线方法，用于将真实世界的C项目翻译为安全的Rust代码。现有方法要么生成缺乏内存安全保证的不安全输出，要么孤立地翻译函数，无法检测跨单元类型不匹配或处理需要全局程序推理的不安全结构。此外，函数级LLM流水线在类型签名变更时需要协调调用者更新，而项目级系统在真实世界的依赖复杂性下往往无法产生可编译输出。Encrust通过以下方式解决这些局限：利用保持应用程序二进制接口（ABI）的封装器模式将边界适配与函数逻辑解耦，并针对集成代码库验证每个中间状态。阶段一（封装替换）使用保持ABI的封装器翻译每个函数，将其拆分为两个组件：一个保留原始原始指针签名的调用透明垫片，以及一个由LLM基于简洁且范围限定的提示生成的安全内部函数。这使得函数级类型更改可独立进行并在失败时自动回滚，无需协调调用者更新。随后，通过确定性的、类型导向的封装器消除步骤，在成功翻译后移除封装器。阶段二（代理式精化）解决超出函数作用域的不安全结构，包括静态可变全局变量、跳过的封装器对以及翻译失败情况，通过一个在基线感知验证门控下对整个代码库操作的LLM代理实现。我们在Laertes基准测试中的7个GNU Coreutils程序和8个库上评估了Encrust，结果显示所有15个程序均显著减少了不安全结构，同时保持了完整的测试向量正确性。