Does the Same Token Mean the Same State? MoE Routing as Signal for Reasoning Control - 专知论文

会员服务 ·

0

相同 · 词元分析器 · 控制器 · anchor · 代码 ·

Does the Same Token Mean the Same State? MoE Routing as Signal for Reasoning Control

翻译：暂无翻译

Kang Chen,Minshen Yu,Junjie Nian,Yaoning Wang,Yixin Cao,Yugang Jiang

In sparse Mixture-of-Experts language models, does the same token id imply the same router state and the same experts producing it? Holding the emitted token id fixed at repeated anchors, we find it does not: the experts that produce it still separate task context, trajectory history, and reasoning-effort mode. This residual structure supports test-time control: near \emph{boundary} anchors (the final-response transition) and \emph{delimiter} anchors (which open the answer, e.g.\ \texttt{\textbackslash boxed\{} or code fences), routing neighborhoods already align with final-answer basins at a marker-only readout and strongest when the routing is read at the answer opening. We operationalize this as \textbf{RAD} (Routing Agreement Decoding), an answer-string-free multi-rollout selector: it locates a fixed anchor, represents each rollout by its anchor-window MoE routing states, and returns the densest Weighted-Jaccard $K$-NN route-basin center, without parsing, normalizing, executing, or voting over answer strings. Across 10 sparse-MoE configurations (gpt-oss, Qwen3-MoE) and 6 datasets spanning math, GPQA, and code, RAD is on par with Majority where string voting is well-posed, with small positive paired deltas (RAD $73.9$ / RAD+DC $74.2$ vs.\ Majority $73.6$). Like majority voting, RAD is not a verifier: a dense \emph{wrong} basin can still win. Its value is the interface: the same selector gives direct pass@1 on code, where exact-string voting is ill-defined, and the same routing-density principle, re-anchored to the agentic boundary, improves best-of-16 patch selection on SWE-bench Verified over random, where patches have no answer string to vote on.

翻译：暂无翻译

0

相关内容

IJCAI2024｜基于指令的大模型知识编辑

IJCAI2024｜基于指令的大模型知识编辑

专知会员服务

30+阅读 · 2024年5月15日

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

专知会员服务

17+阅读 · 2024年1月26日

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

专知会员服务

39+阅读 · 2023年11月5日

不可错过！厦大《模式识别》课程，附Slides

不可错过！厦大《模式识别》课程，附Slides

专知会员服务

57+阅读 · 2023年6月30日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

专知会员服务

21+阅读 · 2022年2月5日

【ACMMM2021】通用近似交叉验证的模型选择：监督、半监督与比对学习

专知会员服务

16+阅读 · 2021年10月10日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

哈工大SCIR

17+阅读 · 2019年9月23日

你的TextGAN调出来了么？来看看人在怎么调的

你的TextGAN调出来了么？来看看人在怎么调的

专知

85+阅读 · 2019年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

Github项目推荐 | Sentence Classification - 神经网络句子分类(陈述/疑问/感叹/祈使)

Github项目推荐 | Sentence Classification - 神经网络句子分类(陈述/疑问/感叹/祈使)

AI研习社

14+阅读 · 2019年1月16日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

AI研习社

14+阅读 · 2018年7月22日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

近Kenmotsu流形的曲率与Ricci孤立子

国家自然科学基金

0+阅读 · 2015年12月31日

通信约束下间歇量测的多自主体系统趋同控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

若干偏微分方程控制系统的适定正则性及稳定性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

关联规则集上的知识发现

国家自然科学基金

9+阅读 · 2015年12月31日

基于Phase-type分布的多状态系统可靠性模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向信息安全芯片的物理不可克隆函数电路建模与实现

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

随机双曲型偏微分方程的控制和观测

国家自然科学基金

0+阅读 · 2014年12月31日

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

Arxiv

0+阅读 · 6月23日

Spectral Evolution-Guided Token Pruning in Multimodal Large Language Models

Arxiv

0+阅读 · 6月23日

To select or not to select: predictively consistent priors instead of model selection

Arxiv

0+阅读 · 6月22日

Local Causal Attribution of Chain-of-Thought Reasoning

Arxiv

0+阅读 · 6月20日

Multilingual Tokenization through the Lens of Indian Languages: Challenges and Insights

Arxiv

0+阅读 · 6月19日

Phonemes to the Rescue: Multilingual Tokenization Based on International Phonetic Alphabet

Arxiv

0+阅读 · 6月18日

Comparing Transformers and Hybrid Models at the Token Level

Arxiv

0+阅读 · 6月18日

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

Arxiv

0+阅读 · 6月18日

Higher-Order Token Interactions via Quantum Attention

Arxiv

0+阅读 · 6月18日

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

Arxiv

0+阅读 · 6月16日

VIP会员

文章信息

相关主题

词元分析器

最新内容

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

10+阅读 · 今天7:25

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

3+阅读 · 今天6:54

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

3+阅读 · 今天6:52

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

2+阅读 · 今天6:33

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

9+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

7+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

8+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

《国防领域敏感性分析白皮书》

《国防领域敏感性分析白皮书》

专知会员服务

9+阅读 · 6月25日

综述 | 从问答到任务完成：Agent系统与Harness设计

综述 | 从问答到任务完成：Agent系统与Harness设计

专知会员服务

10+阅读 · 6月24日

Agentic RL：框架、实践与长程智能体训练

Agentic RL：框架、实践与长程智能体训练

专知会员服务

10+阅读 · 6月24日

相关VIP内容

IJCAI2024｜基于指令的大模型知识编辑

IJCAI2024｜基于指令的大模型知识编辑

专知会员服务

30+阅读 · 2024年5月15日

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

AAAI 2024 | MolTailor：通过文本提示定制化学分子表征以适应特定任务的方法

专知会员服务

17+阅读 · 2024年1月26日

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

EMNLP2023：MMEdit——如何编辑多模态大语言模型？

专知会员服务

39+阅读 · 2023年11月5日

不可错过！厦大《模式识别》课程，附Slides

不可错过！厦大《模式识别》课程，附Slides

专知会员服务

57+阅读 · 2023年6月30日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

【ICLR2022】时序对齐预测的监督表示学习与少样本序列分类

专知会员服务

21+阅读 · 2022年2月5日

【ACMMM2021】通用近似交叉验证的模型选择：监督、半监督与比对学习

专知会员服务

16+阅读 · 2021年10月10日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

【ACL2020】不要停止预训练:根据领域和任务自适应调整语言模型，Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

专知会员服务

46+阅读 · 2020年4月25日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

热门VIP内容

开通专知VIP会员享更多权益服务

巡飞弹与反无人机系统——现代战场的两大支柱

《北约数字教官网络发展路径》128页报告

无人机自主控制与人工智能：系统性综述

《打造“黄金舰队”》57页报告

相关资讯

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

赛尔原创 | EMNLP 2019 基于上下文感知的变分自编码器建模事件背景知识进行If-Then类型常识推理

哈工大SCIR

17+阅读 · 2019年9月23日

你的TextGAN调出来了么？来看看人在怎么调的

你的TextGAN调出来了么？来看看人在怎么调的

专知

85+阅读 · 2019年6月6日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

近期语音类前沿论文

近期语音类前沿论文

深度学习每日摘要

14+阅读 · 2019年3月17日

Github项目推荐 | Sentence Classification - 神经网络句子分类(陈述/疑问/感叹/祈使)

Github项目推荐 | Sentence Classification - 神经网络句子分类(陈述/疑问/感叹/祈使)

AI研习社

14+阅读 · 2019年1月16日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

文本分类又来了，用 Scikit-Learn 解决多类文本分类问题

AI研习社

14+阅读 · 2018年7月22日

Focal Loss for Dense Object Detection

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

From Softmax to Sparsemax-ICML16（1）

From Softmax to Sparsemax-ICML16（1）

KingsGarden

74+阅读 · 2016年11月26日

相关论文

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

Arxiv

0+阅读 · 6月23日

Spectral Evolution-Guided Token Pruning in Multimodal Large Language Models

Arxiv

0+阅读 · 6月23日

To select or not to select: predictively consistent priors instead of model selection

Arxiv

0+阅读 · 6月22日

Local Causal Attribution of Chain-of-Thought Reasoning

Arxiv

0+阅读 · 6月20日

Multilingual Tokenization through the Lens of Indian Languages: Challenges and Insights

Arxiv

0+阅读 · 6月19日

Phonemes to the Rescue: Multilingual Tokenization Based on International Phonetic Alphabet

Arxiv

0+阅读 · 6月18日

Comparing Transformers and Hybrid Models at the Token Level

Arxiv

0+阅读 · 6月18日

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

Arxiv

0+阅读 · 6月18日

Higher-Order Token Interactions via Quantum Attention

Arxiv

0+阅读 · 6月18日

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

Arxiv

0+阅读 · 6月16日

相关基金

近Kenmotsu流形的曲率与Ricci孤立子

国家自然科学基金

0+阅读 · 2015年12月31日

通信约束下间歇量测的多自主体系统趋同控制研究

国家自然科学基金

0+阅读 · 2015年12月31日

自相似序列的无理指数、分形及相关问题

国家自然科学基金

0+阅读 · 2015年12月31日

若干偏微分方程控制系统的适定正则性及稳定性分析

国家自然科学基金

0+阅读 · 2015年12月31日

基于犹豫模糊语言信息的定性决策理论与方法

国家自然科学基金

2+阅读 · 2015年12月31日

关联规则集上的知识发现

国家自然科学基金

9+阅读 · 2015年12月31日

基于Phase-type分布的多状态系统可靠性模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

面向信息安全芯片的物理不可克隆函数电路建模与实现

国家自然科学基金

0+阅读 · 2014年12月31日

多域网络安全的异构策略语义形态与验证机制

国家自然科学基金

0+阅读 · 2014年12月31日

随机双曲型偏微分方程的控制和观测

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员