Binary vulnerability analysis is increasingly performed by LLM-based agents in an iterative, multi-pass manner, with the model as the core decision-maker. However, how such systems organize exploration over hundreds of reasoning steps remains poorly understood, due to limited context windows and implicit token-level behaviors. We present the first large-scale, trace-level study showing that multi-pass LLM reasoning gives rise to structured, token-level implicit patterns. Analyzing 521 binaries with 99,563 reasoning steps, we identify four dominant patterns: early pruning, path-dependent lock-in, targeted backtracking, and knowledge-guided prioritization that emerge implicitly from reasoning traces. These token-level implicit patterns serve as an abstraction of LLM reasoning: instead of explicit control-flow or predefined heuristics, exploration is organized through implicit decisions regulating path selection, commitment, and revision. Our analysis shows these patterns form a stable, structured system with distinct temporal roles and measurable characteristics. Our results provide the first systematic characterization of LLM-driven binary analysis and a foundation for more reliable analysis systems.
翻译:二进制漏洞分析越来越多地由基于大语言模型(LLM)的代理以迭代、多轮的方式执行,其中模型是核心决策者。然而,由于有限的上下文窗口和隐式的令牌级行为,这类系统如何组织跨越数百个推理步骤的探索过程仍不明确。我们首次进行了大规模、轨迹级的研究,表明多轮LLM推理会产生结构化的、令牌级的隐式模式。通过分析521个二进制文件及其99,563个推理步骤,我们识别出四种主要模式:早期剪枝、路径依赖锁定、定向回溯和知识引导优先级排序,这些模式隐含地产生于推理轨迹中。这些令牌级的隐式模式作为LLM推理的一种抽象:探索过程并非通过显式的控制流或预定义启发式规则来组织,而是通过调节路径选择、承诺和修正的隐含决策来组织。我们的分析表明,这些模式形成了一个稳定、结构化的系统,具有不同的时间角色和可测量的特征。我们的结果首次系统性地刻画了LLM驱动的二进制分析,并为构建更可靠的分析系统奠定了基础。