VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners

Rust is a programming language that combines memory safety and low-level control, providing C-like performance while guaranteeing the absence of undefined behaviors by default. Rust's growing popularity has prompted research on safe and correct transpiling of existing code-bases to Rust. Existing work falls into two categories: rule-based and large language model (LLM)-based. While rule-based approaches can theoretically produce correct transpilations that maintain input-output equivalence to the original, they often yield unreadable Rust code that uses unsafe subsets of the Rust language. On the other hand, while LLM-based approaches typically produce more readable, maintainable, and safe code, they do not provide any guarantees about correctness. In this work, we present VERT, a tool that can produce readable Rust transpilations with formal guarantees of correctness. VERT's only requirement is that there is Web Assembly compiler for the source language, which is true for most major languages. VERT first uses the Web Assembly compiler to obtain an oracle Rust program. In parallel, VERT uses an LLM to generate a readable candidate Rust program. This candidate is verified against the oracle, and if verification fails, we regenerate a new candidate transpilation until verification succeeds. We evaluate VERT by transpiling a suite of 1,394 programs taken from competitive programming style benchmarks. Combining Anthropic's Claude-2 and VERT increases Rust transpilations passing property-based testing from 31% to 54% and bounded model-checking from 1% to 42% compared to using Claude alone. In addition, we evaluate VERT's ability to generate non-trivial safe Rust on programs taken from real-world C projects that make significant use of pointers. Our results provide insights into the limitations of LLMs to write safe Rust.

翻译：Rust是一种兼具内存安全性与底层控制能力的编程语言，在保证类C性能的同时默认杜绝未定义行为。随着Rust的日益普及，如何将现有代码库安全正确地转译为Rust成为研究热点。现有工作可分为两类：基于规则的方法和基于大语言模型（LLM）的方法。基于规则的方法理论上能生成保持输入输出等价性的正确转译结果，但往往产生可读性差且使用Rust不安全子集的代码；而基于LLM的方法通常能生成更可读、可维护且安全的代码，却无法提供正确性保证。本研究提出VERT工具，能够生成具有形式化正确性保证的可读Rust转译代码。VERT唯一的前提要求是源语言存在WebAssembly编译器——这对大多数主流语言皆成立。VERT首先通过WebAssembly编译器获取参考Rust程序（oracle），同时使用LLM生成可读的候选Rust程序。该候选程序将与参考程序进行验证，若验证失败则重新生成候选转译直至验证成功。我们通过转译来自竞技编程风格基准测试的1,394个程序集对VERT进行评估。相较于单独使用Claude模型，结合Anthropic的Claude-2与VERT可将通过基于属性测试的Rust转译比例从31%提升至54%，将有界模型检验通过率从1%提升至42%。此外，我们还评估了VERT在涉及指针操作的真实C语言项目上生成非平凡安全Rust代码的能力。研究结果揭示了LLM编写安全Rust代码的局限性。