The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens. By analyzing high-dimensional activation vectors from different LLMs, we probe whether different cognitive levels, ranging from basic recall (Remember) to abstract synthesis (Create), are linearly separable within the model's residual streams. Our results demonstrate that linear classifiers achieve approximately 95% mean accuracy across all Bloom levels, providing strong evidence that cognitive level is encoded in a linearly accessible subspace of the model's representations. These findings provide evidence that the model resolves the cognitive difficulty of a prompt early in the forward pass, with representations becoming increasingly separable across layers.
翻译:大语言模型的黑箱特性需要超越表面性能指标的新型评估框架。本研究使用布鲁姆分类法作为分层透镜,探究认知复杂度的内部神经表征。通过分析不同大语言模型的高维激活向量,我们探测了从基础记忆(记忆)到抽象综合(创造)的不同认知层级是否在模型的残差流中线性可分。我们的结果表明,线性分类器在所有布鲁姆层级上达到约95%的平均准确率,这为认知层级被编码在模型表征的线性可访问子空间中提供了有力证据。这些发现证明模型在前向传播早期就解析了提示的认知难度,且表征在不同层间变得越来越可分。