A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we show this unified representation creates a critical dependency: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. This mechanism, which we term the subordination of reasoning, shifts the process of reasoning from impartial inference toward goal-directed justification. Our discovery offers a mechanistic account for systemic bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.
翻译:生物智能与人工智能共同面临的一个核心架构问题是:判断过程依赖于专门化模块还是统一、领域通用的资源。尽管在大语言模型(LLMs)中发现可解码的、针对不同概念的神经表征暗示了模块化架构,但这些表征是否真正独立仍是一个开放问题。本文为一种收敛式架构提供了证据。在多种大语言模型中,我们发现多样化的评价性判断均沿一个主导维度进行计算,我们将其称为“效价-认同轴”(Valence-Assent Axis, VAA)。该轴同时编码主观效价(“何为善”)与模型对事实主张的认同(“何为真”)。通过直接干预实验,我们证明这一统一表征产生了关键依赖:VAA作为控制信号,引导生成过程构建与其评价状态一致的论证依据,甚至不惜牺牲事实准确性。我们将此机制称为“推理的从属化”,它将推理过程从公正推断转向目标导向的合理化。这一发现为系统性偏见与幻觉提供了机制性解释,揭示了促进连贯判断的架构如何系统性地损害忠实推理。