Despite extensive usage in high-performance, low-level systems programming applications, C is susceptible to vulnerabilities due to manual memory management and unsafe pointer operations. Rust, a modern systems programming language, offers a compelling alternative. Its unique ownership model and type system ensure memory safety without sacrificing performance. In this paper, we present Syzygy, an automated approach to translate C to safe Rust. Our technique uses a synergistic combination of LLM-driven code and test translation guided by dynamic-analysis-generated execution information. This paired translation runs incrementally in a loop over the program in dependency order of the code elements while maintaining per-step correctness. Our approach exposes novel insights on combining the strengths of LLMs and dynamic analysis in the context of scaling and combining code generation with testing. We apply our approach to successfully translate Zopfli, a high-performance compression library with ~3000 lines of code and 98 functions. We validate the translation by testing equivalence with the source C program on a set of inputs. To our knowledge, this is the largest automated and test-validated C to safe Rust code translation achieved so far.
翻译:尽管C语言在高性能、低层系统编程领域应用广泛,但其手动内存管理和不安全的指针操作容易引发安全漏洞。Rust作为一种现代系统编程语言,提供了极具吸引力的替代方案。其独特的所有权模型和类型系统可在不牺牲性能的前提下确保内存安全。本文提出Syzygy,一种将C语言自动转换为安全Rust代码的方法。该技术通过动态分析生成的执行信息指导,协同结合了LLM驱动的代码翻译与测试翻译。这种配对翻译过程按照代码元素的依赖顺序,以循环递增方式执行,同时保持每一步的正确性。我们的方法揭示了在代码生成与测试的规模化整合场景中,结合LLM优势与动态分析能力的新思路。我们将该方法成功应用于Zopfli(一个包含约3000行代码、98个函数的高性能压缩库)的翻译,并通过多组输入数据验证了翻译结果与原始C程序的等效性。据我们所知,这是目前实现的最大规模、经过测试验证的C到安全Rust的自动代码翻译。