Multiple-choice question answering (MCQA) is easy to evaluate but adds a meta-task: models must both solve the problem and output the symbol that *represents* the answer, conflating reasoning errors with symbol-binding failures. We study how language models implement MCQA internally using representational analyses (PCA, linear probes) as well as causal interventions. We find that option-boundary (newline) residual states often contain strong linearly decodable signals related to per-option correctness. Winner-identity probing reveals a two-stage progression: the winning *content position* becomes decodable immediately after the final option is processed, while the *output symbol* is represented closer to the answer emission position. Tests under symbol and content permutations support a two-stage mechanism in which models first select a winner in content space and then bind or route that winner to the appropriate symbol to emit.
翻译:多项选择题问答(MCQA)易于评估,但引入了一项元任务:模型不仅需要解决问题,还需输出*代表*答案的符号,从而将推理错误与符号绑定失败混为一谈。本研究通过表征分析(PCA、线性探针)与因果干预,探究语言模型在内部如何实现MCQA。研究发现选项边界(换行符)残差状态常包含与各选项正确性相关的强线性可解码信号。胜者身份探针揭示了两阶段演进过程:在最终选项处理完成后,胜出的*内容位置*立即变得可解码,而*输出符号*的表征则更接近答案生成位置。符号与内容置换实验支持两阶段机制的存在:模型首先在内容空间选择胜出项,随后将该胜出项绑定或路由至待输出的对应符号。