Large language models (LLMs) have demonstrated impressive performance in various natural language processing tasks, yet their ability to perform multi-step logical reasoning remains an open challenge. Although Chain-of-Thought prompting has improved logical reasoning by enabling models to generate intermediate steps, it lacks mechanisms to assess the coherence of these logical transitions. In this paper, we propose a novel, lightweight evaluation strategy for logical reasoning that uses query-key alignments inside transformer attention heads. By computing a single forward pass and extracting a "QK-score" from carefully chosen heads, our method reveals latent representations that reliably separate valid from invalid inferences, offering a scalable alternative to traditional ablation-based techniques. We also provide an empirical validation on multiple logical reasoning benchmarks, demonstrating improved robustness of our evaluation method against distractors and increased reasoning depth. The experiments were conducted on a diverse set of models, ranging from 1.5B to 70B parameters.
翻译:大语言模型(LLMs)在各类自然语言处理任务中展现出卓越性能,但其执行多步逻辑推理的能力仍是一个开放挑战。尽管思维链提示通过促使模型生成中间推理步骤改善了逻辑推理效果,但该方法缺乏评估这些逻辑过渡连贯性的机制。本文提出一种新颖、轻量级的逻辑推理评估策略,该方法利用Transformer注意力头内部的查询-键对齐机制。通过单次前向传播计算,并从精心选择的注意力头中提取“QK分数”,我们的方法揭示了能够可靠区分有效推理与无效推理的潜在表征,为传统基于消融的技术提供了可扩展的替代方案。我们在多个逻辑推理基准测试上进行了实证验证,结果表明我们的评估方法在面对干扰项时具有更强的鲁棒性,并能适应更深层次的推理任务。实验在参数规模从15亿到700亿不等的多样化模型集合上进行。