SoK: AI-Augmented Binary Reversing

Binary reversing is fundamental to software understanding, vulnerability discovery, malware investigation, and firmware auditing. However, it remains inherently challenging due to the irreversible loss of semantic information during compilation. Recent advances in machine learning, large language models (LLMs), and agentic AI systems have accelerated the adoption of AI-augmented binary reversing. Yet, the resulting body of work has become increasingly fragmented across reversing domains, artifact representations, learning approaches, and evaluation practices. This paper presents the first comprehensive systematization of knowledge on AI-augmented binary reversing. We analyze 144 research papers published since 2015, and organize them into 22 binary reversing domains according to the inference tasks. We further introduce a unified taxonomy spanning conventional and AI-augmented reversing pipelines. Our taxonomy connects traditional analysis techniques, binary-derived artifacts, representation strategies, learning paradigms, and downstream inference tasks, while clarifying the emerging roles of LLMs and agentic AI systems. By establishing a common vocabulary and structured framework, we provide a holistic view of the field's evolution over the past decade. Our study reveals common structures underlying seemingly disparate approaches, highlights persistent technical challenges and evaluation gaps, and identifies promising opportunities for future research. Collectively, these insights clarify the current state of the field and provide a foundation for the next generation of reliable and scalable AI-augmented binary reversing systems.

翻译：二进制逆向工程是软件理解、漏洞发现、恶意软件分析和固件审计的基础工作。然而，由于编译过程中语义信息的不可逆丢失，该领域始终面临固有挑战。近年来，机器学习、大语言模型（LLMs）和智能体AI系统的进展加速了人工智能增强的二进制逆向工程的采用，但由此产生的研究成果在逆向领域、工件表示、学习方法及评估实践等方面日益碎片化。本文首次系统化梳理了人工智能增强二进制逆向工程的知识体系，分析了2015年以来发表的144篇研究论文，并根据推理任务将其归类为22个二进制逆向领域。我们进一步提出了一个统一分类法，涵盖传统逆向流水线和AI增强逆向流水线。该分类法连接了传统分析技术、二进制派生工件、表示策略、学习范式及下游推理任务，同时阐明了LLMs和智能体AI系统的新兴作用。通过建立通用术语和结构化框架，我们全景式展示了该领域过去十年的演进历程。研究揭示了看似不同方法背后的共同结构，指出了持续存在的技术挑战与评估空白，并识别了未来研究的潜在机遇。这些见解共同阐明了该领域的当前状态，为开发下一代可靠且可扩展的AI增强二进制逆向系统奠定了基础。