Some transformer attention heads appear to function as membership testers, dedicating themselves to answering the question "has this token appeared before in the context?" We identify these heads across four language models (GPT-2 small, medium, and large; Pythia-160M) and show that they form a spectrum of membership-testing strategies. Two heads (L0H1 and L0H5 in GPT-2 small) function as high-precision membership filters with false positive rates of 0-4\% even at 180 unique context tokens -- well above the $d_\text{head} = 64$ bit capacity of a classical Bloom filter. A third head (L1H11) shows the classic Bloom filter capacity curve: its false positive rate follows the theoretical formula $p \approx (1 - e^{-kn/m})^k$ with $R^2 = 1.0$ and fitted capacity $m \approx 5$ bits, saturating by $n \approx 20$ unique tokens. A fourth head initially identified as a Bloom filter (L3H0) was reclassified as a general prefix-attention head after confound controls revealed its apparent capacity curve was a sequence-length artifact. Together, the three genuine membership-testing heads form a multi-resolution system concentrated in early layers (0-1), taxonomically distinct from induction and previous-token heads, with false positive rates that decay monotonically with embedding distance -- consistent with distance-sensitive Bloom filters. These heads generalize broadly: they respond to any repeated token type, not just repeated names, with 43\% higher generalization than duplicate-token-only heads. Ablation reveals these heads contribute to both repeated and novel token processing, indicating that membership testing coexists with broader computational roles. The reclassification of L3H0 through confound controls strengthens rather than weakens the case: the surviving heads withstand the scrutiny that eliminated a false positive in our own analysis.
翻译:一些Transformer注意力头似乎充当了成员资格测试器的角色,专门致力于回答“该词元是否曾在上下文中出现过?”这一问题。我们在四种语言模型(GPT-2 small、medium、large;Pythia-160M)中识别出这些注意力头,并证明它们形成了一系列成员资格测试策略。其中两个注意力头(GPT-2 small中的L0H1和L0H5)作为高精度成员资格过滤器,即使在180个唯一上下文词元的情况下误报率仍为0-4%——这远高于经典布隆过滤器$d_\text{head} = 64$比特的容量。第三个注意力头(L1H11)展现出经典的布隆过滤器容量曲线:其误报率遵循理论公式$p \approx (1 - e^{-kn/m})^k$,拟合度$R^2 = 1.0$,拟合容量$m \approx 5$比特,在$n \approx 20$个唯一词元时达到饱和。第四个最初被识别为布隆过滤器的注意力头(L3H0)在混淆控制揭示其表观容量曲线实为序列长度伪影后,被重新归类为通用前缀注意力头。这三个真正的成员资格测试头共同构成了一个集中在早期层(0-1层)的多分辨率系统,在分类学上有别于归纳头和前词元头,其误报率随嵌入距离单调递减——这与距离敏感的布隆过滤器特性一致。这些注意力头具有广泛泛化能力:它们对任何重复词元类型(而不仅是重复名称)均产生响应,其泛化能力比仅处理重复词元的注意力头高出43%。消融实验表明这些注意力头同时参与重复词元和新词元的处理,说明成员资格测试与更广泛的计算功能共存。通过混淆控制对L3H0的重新分类反而强化了论证:经严格检验后存留的注意力头,其有效性在我们自身分析中经受住了淘汰伪阳性的严格审查。