Deep active learning has previously been explored for LLM in-context sample selection, but not with methods that utilise recent advances in understanding of transformer activations. In this paper, we test the hypothesis that model activations could provide a fine-grained signal to optimise the selection of in-context examples. We present the most comprehensive analysis to date of MLP activation-based deep active learning methods applied to in-context learning, including how different attention masking strategies impact active learning across diverse classification and generative datasets, using both Llama-3.2-3B and Qwen2.5-3B base models. However, we find a negative result: MLP outputs, viewed through the lenses of massive activations or the first four moments, do not correlate with example quality or task performance. Specifically, the absolute Spearman correlation coefficient is at most 0.33 for all tasks and models we tested, showing that such activation-based sampling should not be used for in-context learning. We hypothesise that this may be due to superposition, whereby models represent more features than they have dimensionality, suggesting that methods like Sparse Autoencoders (SAEs) may be a promising future direction.
翻译:深度主动学习此前已被用于大语言模型的上下文样本选择,但尚未利用Transformer激活机制的最新进展。本文验证了模型激活能为上下文示例的优化选择提供精细信号的假设。我们呈现了迄今为止对基于MLP激活的深度主动学习方法应用于上下文学习的最全面分析,包括不同注意力掩码策略如何影响跨多种分类和生成数据集的主动学习,并采用Llama-3.2-3B和Qwen2.5-3B基础模型。然而,我们发现负面结果:通过大规模激活或前四阶矩视角观察的MLP输出,与示例质量或任务性能均不存在相关性。具体而言,所有测试任务和模型的绝对斯皮尔曼相关系数至多为0.33,表明此类基于激活的采样方法不应用于上下文学习。我们推测这可能是由超位置现象导致——模型编码特征维度超出其表征维度,表明稀疏自编码器等方法或将成为有前景的未来研究方向。