Binary decompilation is a critical reverse engineering task aimed at reconstructing high-level source code from stripped executables. Although Large Language Models (LLMs) have recently shown promise, they often suffer from "logical hallucinations" and "semantic misalignment" due to the irreversible semantic loss during compilation, resulting in generated code that fails to re-execute. In this study, we propose Cognitive Decompiler Refinement with Robustness (CoDe-R), a lightweight two-stage code refinement framework. The first stage introduces Semantic Cognitive Enhancement (SCE), a Rationale-Guided Semantic Injection strategy that trains the model to recover high-level algorithmic intent alongside code. The second stage introduces a Dynamic Dual-Path Fallback (DDPF) mechanism during inference, which adaptively balances semantic recovery and syntactic stability via a hybrid verification strategy. Evaluation on the HumanEval-Decompile benchmark demonstrates that CoDe-R (using a 1.3B backbone) establishes a new State-of-the-Art (SOTA) in the lightweight regime. Notably, it is the first 1.3B model to exceed an Average Re-executability Rate of 50.00%, significantly outperforming the baseline and effectively bridging the gap between efficient models and expert-level performance. Our code is available at https://github.com/Theaoi/CoDe-R.
翻译:二进制反编译是一项关键的反向工程任务,旨在从剥离后的可执行文件中重构高级源代码。尽管大型语言模型(LLM)近年来展现出潜力,但由于编译过程中不可逆的语义丢失,它们常遭受"逻辑幻觉"和"语义错位"的困扰,导致生成的代码无法重新执行。本研究提出"鲁棒性认知反编译精炼框架(CoDe-R)",一种轻量级两阶段代码精炼框架。第一阶段引入"语义认知增强(SCE)",这是一种基于推理引导的语义注入策略,训练模型在生成代码的同时恢复高级算法意图。第二阶段在推理过程中引入"动态双路径回退(DDPF)"机制,通过混合验证策略自适应平衡语义恢复与语法稳定性。在HumanEval-Decompile基准上的评估表明,CoDe-R(基于1.3B主干模型)在轻量级方法中建立了新的最优性能(SOTA)。值得注意的是,它是首个平均可重执行率超过50.00%的1.3B模型,显著优于基线方法,有效弥合了高效模型与专家级性能之间的差距。我们的代码开源在https://github.com/Theaoi/CoDe-R。