Human-Oriented Binary Reverse Engineering (HOBRE) lies at the intersection of binary and source code, aiming to lift binary code to human-readable content relevant to source code, thereby bridging the binary-source semantic gap. Recent advancements in uni-modal code model pre-training, particularly in generative Source Code Foundation Models (SCFMs) and binary understanding models, have laid the groundwork for transfer learning applicable to HOBRE. However, existing approaches for HOBRE rely heavily on uni-modal models like SCFMs for supervised fine-tuning or general LLMs for prompting, resulting in sub-optimal performance. Inspired by recent progress in large multi-modal models, we propose that it is possible to harness the strengths of uni-modal code models from both sides to bridge the semantic gap effectively. In this paper, we introduce a novel probe-and-recover framework that incorporates a binary-source encoder-decoder model and black-box LLMs for binary analysis. Our approach leverages the pre-trained knowledge within SCFMs to synthesize relevant, symbol-rich code fragments as context. This additional context enables black-box LLMs to enhance recovery accuracy. We demonstrate significant improvements in zero-shot binary summarization and binary function name recovery, with a 10.3% relative gain in CHRF and a 16.7% relative gain in a GPT4-based metric for summarization, as well as a 6.7% and 7.4% absolute increase in token-level precision and recall for name recovery, respectively. These results highlight the effectiveness of our approach in automating and improving binary code analysis.
翻译:面向人类的二进制逆向工程位于二进制代码与源代码的交汇处,旨在将二进制代码提升至与源代码相关的人类可读内容,从而弥合二进制与源代码之间的语义鸿沟。近期单模态代码模型预训练的进展,特别是在生成式源代码基础模型和二进制理解模型方面,为适用于HOBRE的迁移学习奠定了基础。然而,现有的HOBRE方法严重依赖单模态模型(如SCFMs)进行监督微调,或依赖通用大语言模型进行提示,导致性能欠佳。受近期大规模多模态模型进展的启发,我们提出可以同时利用两侧单模态代码模型的优势来有效弥合语义鸿沟。本文提出了一种新颖的探测与恢复框架,该框架结合了二进制-源代码编码器-解码器模型和用于二进制分析的黑盒大语言模型。我们的方法利用SCFMs中预训练的知识来合成相关的、富含符号的代码片段作为上下文。这种额外的上下文使黑盒大语言模型能够提高恢复准确性。我们在零样本二进制摘要和二进制函数名恢复方面展示了显著改进:在摘要任务中,CHRF指标相对提升10.3%,基于GPT4的评估指标相对提升16.7%;在函数名恢复任务中,词元级精确率和召回率分别绝对提升6.7%和7.4%。这些结果凸显了我们的方法在自动化和改进二进制代码分析方面的有效性。