Detecting pre-training data in Large Language Models (LLMs) is crucial for auditing data privacy and copyright compliance, yet it remains challenging in black-box, zero-shot settings where computational resources and training data are scarce. While existing likelihood-based methods have shown promise, they typically aggregate token-level scores using uniform weights, thereby neglecting the inherent information-theoretic dynamics of autoregressive generation. In this paper, we hypothesize and empirically validate that memorization signals are heavily skewed towards the high-entropy initial tokens, where model uncertainty is highest, and decay as context accumulates. To leverage this linguistic property, we introduce Positional Decay Reweighting (PDR), a training-free and plug-and-play framework. PDR explicitly reweights token-level scores to amplify distinct signals from early positions while suppressing noise from later ones. Extensive experiments show that PDR acts as a robust prior and can usually enhance a wide range of advanced methods across multiple benchmarks.
翻译:检测大型语言模型(LLM)中的预训练数据对于审计数据隐私和版权合规性至关重要,然而在计算资源和训练数据稀缺的黑盒零样本设置中,这仍然具有挑战性。虽然现有的基于似然的方法已显示出潜力,但它们通常使用均匀权重聚合词元级分数,从而忽略了自回归生成固有的信息论动态。在本文中,我们提出假设并实证验证了记忆信号严重偏向于高熵的初始词元(模型不确定性最高的位置),并随着上下文的积累而衰减。为了利用这一语言特性,我们引入了位置衰减重加权(PDR),一个无需训练、即插即用的框架。PDR显式地对词元级分数进行重加权,以放大来自早期位置的显著信号,同时抑制来自后期位置的噪声。大量实验表明,PDR作为一种稳健的先验,通常能够增强多种先进方法在多个基准测试上的性能。