Needles at Scale: LLM-Assisted Target Selection for Windows Vulnerability Research

The attack surface of a modern operating system is a haystack: thousands of signed binaries and millions of functions, almost none relevant to any given vulnerability. A human analyst or an LLM agent must pick the function worth reading before analyzing it. At whole-OS scope, this target selection, not the analysis, is the binding constraint. We present Symbolicate-Enrich-Sample, a low-cost batch pipeline that turns a corpus of production Windows binaries into a queryable, priority-ranked research queue. We (i) recover function-level symbols for stripped vendor binaries by auto-fetching the public symbol files and joining them to a recovered call graph; (ii) attach cheap, deterministic structural features to each named function and, conditioned on those features, use a low-cost language model to assign a reachability tier, a risk level, a bug-class hypothesis, and a rationale; and (iii) draw diverse, prioritized batches via a priority-weighted importance sampler. The contribution is a selection substrate: the prioritization layer a downstream detector or LLM agent runs on top of. Across a whole Windows image of 7,231,419 functions, the labels are markedly selective, and stacking deterministic filters on them leaves a ~22K-function shortlist: the candidate needles, few enough for a human or agent to work through. We characterize the pipeline's selectivity and its failure modes, describe the methodology, and report aggregate statistics; we withhold the derived dataset for legal and dual-use reasons.

翻译：现代操作系统的攻击面如同浩瀚的数据海洋：数以千计的二进制文件和数百万个函数，其中绝大多数与特定漏洞毫无关联。人类分析师或LLM代理在分析前必须筛选出值得研究的函数。在操作系统全局范围内，这种目标筛选比分析本身更具约束性。本文提出Symbolicate-Enrich-Sample低代价批处理流水线，可将生产环境Windows二进制文件集合转化为可查询、优先级排序的研究队列。我们通过以下方法实现：(i) 自动获取公开符号文件并与重构的调用图关联，恢复经过剥离的供应商二进制文件中的函数级符号；(ii) 为每个命名函数附加低成本确定性结构特征，并基于这些特征使用轻量级语言模型分配可达性等级、风险等级、漏洞类别假设及推理依据；(iii) 通过优先级加权重要性采样器生成多样化、优先排序的批处理数据。本研究的核心贡献在于构建了选择基座：下游检测器或LLM代理可在该优先级分层之上运行。在包含7,231,419个函数的完整Windows镜像上，标注结果展现出显著的选择性，通过叠加确定性过滤器可生成约22K函数的候选列表——这些潜在漏洞目标足以供人类或代理系统进行详尽分析。我们描述了流水线的选择特性及其失效模式，阐述了方法论并报告了聚合统计结果；出于法律及双重用途考量，未公开派生数据集。