Translating machine code into human-readable high-level languages is an open research problem in reverse engineering. Despite recent advancements in LLM-based decompilation to C, modern languages like Dart and Swift are unexplored. In this paper, we study the use of small specialized LLMs as an idiomatic decompiler for such languages. Additionally, we investigate the augmentation of training data using synthetic same-language examples, and compare it against adding human-written examples using related-language (Swift -> Dart). We apply CODEBLEU to evaluate the decompiled code readability and compile@k to measure the syntax correctness. Our experimental results show that on a 73-function Dart test dataset (representing diverse complexity levels), our 4B specialized model achieves 71.3 CODEBLEU (95% CI 65.5-77.1), approximately comparable to a ~480B code model (73.1; 67.4-78.8). On a subset of 34 natural Dart functions, it reaches compile@k5 = 79.4% (Wilson 95% CI 63.2-89.7), vs. 64.7% (47.9-78.5) for the base model; the difference is suggestive but not statistically significant at 0.05. Our results indicate that adding Swift training data helps at 8B but not at 4B, suggesting a capacity threshold for effective cross-lingual transfer. Our experimental results show that small specialized models can generate readable, idiomatic Dart with meaningful identifiers while using minimal compute.
翻译:将机器码翻译成人类可读的高级语言是逆向工程中的一个开放研究课题。尽管基于LLM的反编译至C语言近期取得了进展,但Dart和Swift等现代语言尚未被探索。本文研究了使用小型专用LLM作为此类语言的习惯性反编译器。此外,我们探究了利用合成同语言示例增强训练数据,并将其与使用相关语言(Swift→Dart)添加人工编写示例的方法进行对比。我们采用CODEBLEU评估反编译代码的可读性,并使用compile@k衡量语法正确性。实验结果表明,在包含73个Dart函数(涵盖不同复杂度层级)的测试数据集上,我们4B参数的专用模型达到了71.3的CODEBLEU分数(95%置信区间65.5-77.1),与约480B参数的代码模型(73.1;67.4-78.8)大致相当。在34个自然Dart函数子集上,该模型达到compile@k5=79.4%(Wilson 95%置信区间63.2-89.7),而基线模型为64.7%(47.9-78.5);差异具有提示性但未达到0.05统计显著性。我们的结果表明,在8B参数规模下添加Swift训练数据有效,但在4B参数规模下无效,这表明跨语言迁移存在容量阈值。实验证明,小型专用模型能够以极少的计算资源生成具有可读性、符合习惯用法且包含有意义标识符的Dart代码。