Automatic decompilers produce functionally correct but often unreadable C code. This paper addresses one stage of the reverse engineering workflow: improving the readability of decompiled code using LLM agents guided by quantitative metrics. We present a three-phase research evolution. Phase 1 (tool-driven steering via Ghidra MCP) suffered from incomplete coverage and inconsistent improvements due to lack of quantitative guidance. Phase 2 (structural similarity validation alone) revealed that agents optimize for metrics in unintended ways, producing structurally equivalent but less readable code. Our contribution is the Quantitative Readability Score (QRS) framework, a composite metric combining a structural similarity gate with three independent readability sub-metrics (Lexical Surprisal, Structural Simplicity, and Idiomatic Quality). We demonstrate that QRS-guided refinement enables LLM agents to make targeted readability improvements without sacrificing correctness. We provide a discussion of the broader reverse engineering workflow (binary lifting, decompilation cleanup, and achieving functional equivalence) as context, however, it remains out of scope.
翻译:自动反编译器能够生成功能正确但往往可读性较差的C代码。本文聚焦于逆向工程工作流中的一个关键环节:利用定量指标引导的LLM智能体提升反编译代码的可读性。我们提出三阶段研究演进:第一阶段(基于Ghidra MCP的工具驱动调整)因缺乏定量指导导致覆盖不完整且改进效果不一致;第二阶段(仅依赖结构相似性验证)暴露出智能体以非预期方式优化指标的问题——生成了结构等价但可读性更差的代码。本研究核心贡献是提出定量可读性评分(QRS)框架,该复合指标整合了结构相似性门控机制与三项独立可读性子指标(词汇惊异度、结构简洁度与惯用表达质量)。实验证明,基于QRS引导的优化能使LLM智能体在不损害正确性的前提下实现定向可读性提升。本文还从更广阔的逆向工程工作流视角(二进制提升、反编译清洗及函数等价性验证)提供了讨论背景,但相关环节未纳入本次研究范围。