C/C++ is a prevalent programming language. Yet, it suffers from significant memory and thread-safety issues. Recent studies have explored automated translation of C/C++ to safer languages, such as Rust. However, these studies focused mostly on the correctness and safety of the translated code, which are indeed critical, but they left other important quality concerns (e.g., performance, robustness, and maintainability) largely unexplored. This work investigates strengths and weaknesses of three C-to-Rust translators, namely C2Rust (a transpiler), C2SaferRust (an LLM-guided transpiler), and TranslationGym (an LLM-based direct translation). We perform an in-depth quantitative and qualitative analysis of several important quality attributes for the translated Rust code of the popular GNU coreutils, using human-based translation as a baseline. To assess the internal and external quality of the Rust code, we: (i) apply Clippy, a rule-based state-of-the-practice Rust static analysis tool; (ii) investigate the capability of an LLM (GPT-4o) to identify issues potentially overlooked by Clippy; and (iii) perform a manual analysis of the issues reported by Clippy and GPT-4o. Our results show that while newer techniques reduce some unsafe and non-idiomatic patterns, they frequently introduce new issues, revealing systematic trade-offs that are not visible under existing evaluation practices. Notably, none of the automated techniques consistently match or exceed human-written translations across all quality dimensions, yet even human-written Rust code exhibits persistent internal quality issues such as readability and non-idiomatic patterns. Together, these findings show that translation quality remains a multi-dimensional challenge, requiring systematic evaluation and targeted tool support beyond both naive automation and manual rewriting.
翻译:C/C++ 是一种主流的编程语言,但它存在严重的内存和线程安全问题。近期研究探索了将 C/C++ 自动翻译为更安全的语言(例如 Rust)的方法。然而,这些研究主要关注翻译代码的正确性与安全性——这固然至关重要,但其他重要的质量考量(例如性能、健壮性和可维护性)在很大程度上尚未得到充分探究。本研究调查了三种 C 到 Rust 翻译工具的优势与不足,即 C2Rust(一种转译器)、C2SaferRust(一种基于 LLM 引导的转译器)和 TranslationGym(一种基于 LLM 的直接翻译工具)。我们以人工翻译为基准,对流行 GNU coreutils 工具集翻译后的 Rust 代码进行了多个重要质量属性的深入定量与定性分析。为评估 Rust 代码的内部与外部质量,我们:(i)应用了 Clippy,一种基于规则、代表当前最佳实践的 Rust 静态分析工具;(ii)探究了大型语言模型(GPT-4o)识别可能被 Clippy 遗漏的问题的能力;(iii)对 Clippy 和 GPT-4o 报告的问题进行了人工分析。我们的结果表明,虽然较新的技术减少了一些不安全和非惯用的模式,但它们常常引入新的问题,揭示了在现有评估实践中不可见的系统性权衡。值得注意的是,在所有质量维度上,没有任何一种自动化技术能够持续达到或超越人工编写的翻译代码,但即便是人工编写的 Rust 代码也表现出持续的内部质量问题,例如可读性和非惯用模式。总之,这些发现表明,翻译质量仍然是一个多维度的挑战,需要超越简单自动化和手动重写的系统性评估及有针对性的工具支持。