The Few Govern the Many:Unveiling Few-Layer Dominance for Time Series Models - 专知论文

会员服务 ·

0

序列 · 时间序列 · 序列模型 · 时间序列模型 · 准确率 ·

2025 年 11 月 10 日

The Few Govern the Many:Unveiling Few-Layer Dominance for Time Series Models

翻译：少数层主导多数：揭示时间序列模型中的少数层主导现象

Xin Qiu,Junlong Tong,Yirong Sun,Yunpu Ma,Xiaoyu Shen

Large-scale models are at the forefront of time series (TS) forecasting, dominated by two paradigms: fine-tuning text-based Large Language Models (LLM4TS) and training Time Series Foundation Models (TSFMs) from scratch. Both approaches share a foundational assumption that scaling up model capacity and data volume leads to improved performance. However, we observe a \textit{\textbf{scaling paradox}} in TS models, revealing a puzzling phenomenon that larger models do \emph{NOT} achieve better performance. Through extensive experiments on two model families across four scales (100M to 1.7B parameters) and diverse data (up to 6B observations), we rigorously confirm that the scaling paradox is a pervasive issue. We then diagnose its root cause by analyzing internal representations, identifying a phenomenon we call \textit{few-layer dominance}: only a small subset of layers are functionally important, while the majority are redundant, under-utilized, and can even distract training. Based on this discovery, we propose a practical method to automatically identify and retain only these dominant layers. In our models, retaining only 21\% of the parameters achieves up to a 12\% accuracy improvement and a 2.7$\times$ inference speedup. We validate the universality of our method on 8 prominent SOTA models (LLM4TS and TSFMs, 90M to 6B), showing that retaining less than 30\% of layers achieves comparable or superior accuracy in over 95\% of tasks.

翻译：大规模模型已成为时间序列预测的前沿技术，主要受两种范式主导：基于文本的大型语言模型的微调（LLM4TS）以及从头训练的时间序列基础模型（TSFM）。这两种方法均基于一个核心假设，即扩大模型容量和数据规模能提升性能。然而，我们在时间序列模型中观察到一个**规模悖论**，揭示了一个令人困惑的现象：更大的模型**并未**实现更好的性能。通过对两个模型家族在四种规模（1亿至17亿参数）和多样化数据（高达60亿观测值）上的大量实验，我们严格证实了规模悖论是一个普遍存在的问题。随后，我们通过分析内部表示来诊断其根本原因，发现了一种称为**少数层主导**的现象：仅有少数层在功能上至关重要，而大多数层是冗余的、未充分利用的，甚至可能干扰训练。基于这一发现，我们提出了一种实用方法，能自动识别并仅保留这些主导层。在我们的模型中，仅保留21%的参数即可实现高达12%的准确率提升和2.7倍的推理加速。我们在8个主流SOTA模型（LLM4TS和TSFM，9000万至60亿参数）上验证了该方法的普适性，结果表明，在超过95%的任务中，保留少于30%的层即可达到相当或更优的准确率。

0

相关内容

数学上，序列是被排成一列的对象（或事件）；这样每个元素不是在其他元素之前，就是在其他元素之后。这里，元素之间的顺序非常重要。

UnHiPPO：面向不确定性的状态空间模型初始化方法

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 2025年6月6日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

专知会员服务

20+阅读 · 2022年3月4日

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

专知会员服务

33+阅读 · 2022年3月4日

【AAAI2022】TLogic:时序知识图谱上可解释链接预测的时间逻辑规则

【AAAI2022】TLogic:时序知识图谱上可解释链接预测的时间逻辑规则

专知会员服务

58+阅读 · 2021年12月16日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

21+阅读 · 2020年4月25日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

【CVPR2023】探索和利用不确定性的不完整多视角分类

【CVPR2023】探索和利用不确定性的不完整多视角分类

专知

43+阅读 · 2023年4月13日

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

LibRec 每周算法：LDA主题模型

LibRec 每周算法：LDA主题模型

LibRec智能推荐

29+阅读 · 2017年12月4日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

半线性广义Tricomi方程Cauchy问题解的生命跨度估计研究

国家自然科学基金

0+阅读 · 2017年12月31日

含非正态及缺失数据的结构方程模型分析

国家自然科学基金

0+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

三类多尺度问题的多尺度算法

国家自然科学基金

1+阅读 · 2015年12月31日

Choquet期望下极限定理及其收敛速度的刻画

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

图的随机p-中心和中位问题的理论和算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook

Arxiv

18+阅读 · 2023年10月16日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

111+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

232+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Arxiv

22+阅读 · 2022年9月29日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Arxiv

18+阅读 · 2021年6月17日

VIP会员

文章信息

相关主题

时间序列模型

最新内容

对抗环境下超视距目标打击的情报支援

对抗环境下超视距目标打击的情报支援

专知会员服务

8+阅读 · 7月22日

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

专知会员服务

3+阅读 · 7月22日

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

专知会员服务

7+阅读 · 7月22日

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

专知会员服务

6+阅读 · 7月22日

《无人机对海面作战影响评估》

《无人机对海面作战影响评估》

专知会员服务

15+阅读 · 7月21日

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

《可损耗无人系统规模化应用对美国军事转型的战略影响（2022-2030）》2026年270页

专知会员服务

12+阅读 · 7月21日

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

博士论文 | 后训练如何损害大模型生成多样性？SimpleStrat与Stylus

专知会员服务

4+阅读 · 7月21日

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

综述 | 面向5G/6G网络的LLM智能体AI：架构、协议与标准化

专知会员服务

6+阅读 · 7月21日

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

五角大楼新设无人机办公室（DRPM-UxS）将如何重塑美国无人系统格局（附美国防部设立备忘录）

专知会员服务

9+阅读 · 7月21日

印度精确打击与指挥架构的断层

印度精确打击与指挥架构的断层

专知会员服务

7+阅读 · 7月20日

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

《NASA喷气推进实验室：高耐久轻质常驻空观测系统（HELIOS）》429页

专知会员服务

9+阅读 · 7月20日

美空军AI完成F-16战斗机自主空战历史性试飞

美空军AI完成F-16战斗机自主空战历史性试飞

专知会员服务

8+阅读 · 7月20日

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

《美政府问责局——武器系统年度评估（2026年）：强制要求成熟技术或可推动转向快速交付》249页

专知会员服务

10+阅读 · 7月20日

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

《美国陆军：通过弹性分布式模型库实现自适应AI优势》

专知会员服务

9+阅读 · 7月20日

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

博士论文 | 理解与改进大语言模型推理：从反转诅咒到连续思维链

专知会员服务

10+阅读 · 7月20日

相关VIP内容

UnHiPPO：面向不确定性的状态空间模型初始化方法

UnHiPPO：面向不确定性的状态空间模型初始化方法

专知会员服务

11+阅读 · 2025年6月6日

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

【超越消息传递:图神经网络的物理启发范式】Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks

专知会员服务

17+阅读 · 2022年5月10日

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

专知会员服务

20+阅读 · 2022年3月4日

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

【CMU-Yuejie Chi等干货书】满足低秩矩阵分解的非凸优化综述，69页pdf，Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview

专知会员服务

33+阅读 · 2022年3月4日

【AAAI2022】TLogic:时序知识图谱上可解释链接预测的时间逻辑规则

【AAAI2022】TLogic:时序知识图谱上可解释链接预测的时间逻辑规则

专知会员服务

58+阅读 · 2021年12月16日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

语义相似性算法演化论文，29页pdf，Evolution of Semantic Similarity - A Survey

专知会员服务

44+阅读 · 2020年4月30日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

21+阅读 · 2020年4月25日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

热门VIP内容

开通专知VIP会员享更多权益服务

《面向复杂地形下无人机跟踪地面机器人（UAV–UGV）的自适应多滤波器扩展卡尔曼滤波框架》

共享认知，分布式研判：复杂行动中的美国空军指挥控制（万字长文）

对抗环境下超视距目标打击的情报支援

纵深侦察：大规模作战行动中远程侦察与监视之迫切需求

相关资讯

【CVPR2023】探索和利用不确定性的不完整多视角分类

【CVPR2023】探索和利用不确定性的不完整多视角分类

专知

43+阅读 · 2023年4月13日

【ICML2021】因果匹配领域泛化

【ICML2021】因果匹配领域泛化

专知

12+阅读 · 2021年8月12日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知

11+阅读 · 2020年8月28日

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

【CVPR2020-旷视】DPGN：分布传播图网络的小样本学习

专知

13+阅读 · 2020年4月1日

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图机器学习 2.2-2.4 Properties of Networks, Random Graph

图与推荐

10+阅读 · 2020年3月28日

【NeurIPS2019】图变换网络：Graph Transformer Network

【NeurIPS2019】图变换网络：Graph Transformer Network

专知

245+阅读 · 2019年11月18日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

31+阅读 · 2018年7月12日

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

CosFace: Large Margin Cosine Loss for Deep Face Recognition论文笔记

统计学习与视觉计算组

44+阅读 · 2018年4月25日

LibRec 每周算法：LDA主题模型

LibRec 每周算法：LDA主题模型

LibRec智能推荐

29+阅读 · 2017年12月4日

LibRec 每周算法：DeepFM

LibRec 每周算法：DeepFM

LibRec智能推荐

14+阅读 · 2017年11月6日

相关论文

DeepSeek-V3 Technical Report

Arxiv

18+阅读 · 2024年12月27日

Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook

Arxiv

18+阅读 · 2023年10月16日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

A Comprehensive Survey on Deep Graph Representation Learning

Arxiv

111+阅读 · 2023年4月11日

On Efficient Training of Large-Scale Deep Learning Models: A Literature Review

Arxiv

232+阅读 · 2023年4月7日

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material

Arxiv

88+阅读 · 2023年4月4日

A Survey of Large Language Models

A Survey of Large Language Models

Arxiv

501+阅读 · 2023年3月31日

ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models

Arxiv

64+阅读 · 2023年3月29日

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Arxiv

22+阅读 · 2022年9月29日

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Arxiv

18+阅读 · 2021年6月17日

相关基金

半线性广义Tricomi方程Cauchy问题解的生命跨度估计研究

国家自然科学基金

0+阅读 · 2017年12月31日

含非正态及缺失数据的结构方程模型分析

国家自然科学基金

0+阅读 · 2015年12月31日

分布式有监督学习的学习理论

国家自然科学基金

17+阅读 · 2015年12月31日

三类多尺度问题的多尺度算法

国家自然科学基金

1+阅读 · 2015年12月31日

Choquet期望下极限定理及其收敛速度的刻画

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

图的随机p-中心和中位问题的理论和算法研究

国家自然科学基金

1+阅读 · 2014年12月31日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

变换结构方程模型的非参数贝叶斯分析

国家自然科学基金

4+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员