Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization

Human cognitive behavior arises from the interaction of specialized brain networks dedicated to distinct functions, such as language, logic, and social reasoning. Inspired by this organization, we propose Mixture of Cognitive Reasoners (MiCRo): a modular, transformer-based architecture post-trained with a curriculum that induces functional specialization across experts. Concretely, we partition the layers of a pretrained language model into four expert modules aligned with well-studied cognitive networks in the human brain. MiCRo offers three key advantages over standard language models. (1) The specialized experts are interpretable and causally meaningful -- ablating a module causes substantial drops on benchmarks requiring its specialized domain. (2) MiCRo's behavior can be dynamically steered at inference time by routing tokens to particular experts (e.g., favoring social over logical reasoning), enabling fine-grained control over outputs. (3) MiCRo outperforms or matches comparable baselines on both machine-learning reasoning benchmarks (e.g., GSM8K, BBH) and alignment to human behavior (CogBench), while maintaining interpretability. Taken together, cognitively grounded functional specialization yields models that are both more human-like and more human-interpretable.

翻译：人类认知行为源于执行特定功能的专业化脑网络之间的相互作用，这些网络分别负责语言、逻辑和社会推理等不同功能。受此组织结构启发，我们提出认知推理器混合模型（MiCRo）：一种基于Transformer的模块化架构，通过课程化后训练诱导专家间的功能特化。具体而言，我们将预训练语言模型的层划分为四个专家模块，分别对应人类大脑中经过充分研究的认知网络。相较于标准语言模型，MiCRo具备三个关键优势：（1）专业化专家具有可解释性和因果意义——消融特定模块会导致需要该专业领域的基准测试性能显著下降；（2）MiCRo在推理时可通过将词元路由至特定专家（例如优先使用社会推理而非逻辑推理）实现行为动态调控，从而实现对输出的细粒度控制；（3）在机器学习推理基准（如GSM8K、BBH）和人类行为对齐测试（CogBench）中，MiCRo均优于或匹配可比基线模型，同时保持可解释性。综上所述，基于认知基础的功能特化能够产生更类人且更易于人类理解的模型。

相关内容

Cognition

关注 4

Cognition：Cognition：International Journal of Cognitive Science Explanation：认知：国际认知科学杂志。 Publisher：Elsevier。 SIT： http://www.journals.elsevier.com/cognition/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日