Existing knowledge probing methods rely on pre-defined queries, limiting extraction to known concepts. We introduce DecompressionLM, a stateless framework for zero-shot concept graph extraction that discovers what language models encode without pre-specified queries or shared cross-sequence state. Our method targets three limitations of common decoding-based probing approaches: (i) cross-sequence coupling that concentrates probability mass on high-frequency prefixes, (ii) competitive decoding effects that suppress long-tail concepts, and (iii) scalability constraints arising from sequential exploration. Using Van der Corput low-discrepancy sequences with arithmetic decoding, DecompressionLM enables deterministic, embarrassingly parallel generation without shared state across sequences. Across two model families and five quantization variants, we find that activation-aware quantization (AWQ-4bit) expands concept coverage by 30-170%, while uniform quantization (GPTQ-Int4) induces 71-86% coverage collapse - divergent behaviors not reliably reflected by explanation-level perplexity. Corpus-based verification further reveals a 19.6-point hallucination gap between top- and bottom-ranked MMLU-Pro Law models. DecompressionLM establishes concept coverage as a complementary evaluation dimension for assessing knowledge breadth and factual grounding in compressed models intended for deployment.
翻译:现有知识探测方法依赖于预定义查询,仅能提取已知概念。我们提出了DecompressionLM,一种用于零样本概念图提取的无状态框架,无需预定义查询或跨序列共享状态即可发现语言模型所编码的内容。我们的方法针对常见基于解码的探测方法的三个局限性:(i) 跨序列耦合将概率质量集中于高频前缀,(ii) 竞争性解码效应抑制长尾概念,(iii) 顺序探索带来的可扩展性限制。通过结合范德科普特低差异序列与算术解码,DecompressionLM实现了确定性、高度并行化的生成,且无需跨序列共享状态。在两个模型族和五种量化变体的实验中,我们发现激活感知量化(AWQ-4bit)将概念覆盖率提升了30-170%,而均匀量化(GPTQ-Int4)则导致71-86%的覆盖率崩溃——这些差异行为无法通过解释层面的困惑度可靠反映。基于语料库的验证进一步揭示了MMLU-Pro Law模型中排名最高与最低模型之间存在19.6个百分点的幻觉差距。DecompressionLM确立了概念覆盖率作为一个补充评估维度,用于评估面向部署的压缩模型的知识广度和事实基础。