A simple connection from loss flatness to compressed neural representations - 专知论文

会员服务 ·

0

损失 · 约束 · 度量 · 几何结构 · 结构 ·

A simple connection from loss flatness to compressed neural representations

翻译：损失平坦性与压缩神经表征之间的简单联系

Shirui Chen,Stefano Recanatesi,Eric Shea-Brown

Despite extensive study, the significance of sharpness -- the trace of the loss Hessian at local minima -- remains unclear. We investigate an alternative perspective: how sharpness relates to the geometric structure of neural representations, specifically representation compression, defined as how strongly neural activations concentrate under local input perturbations. We introduce three measures -- Local Volumetric Ratio (LVR), Maximum Local Sensitivity (MLS), and Local Dimensionality -- and derive upper bounds showing these are mathematically constrained by sharpness: flatter minima necessarily limit compression. We extend these bounds to reparametrization-invariant sharpness and introduce network-wide variants (NMLS, NVR) that provide tighter, more stable bounds than prior single-layer analyses. Empirically, we validate consistent positive correlations across feedforward, convolutional, and transformer architectures. Our results suggest that sharpness fundamentally quantifies representation compression, offering a principled resolution to contradictory findings on the sharpness-generalization relationship.

翻译：尽管已有广泛研究，局部最小值处损失海森矩阵迹（即锐度）的重要性仍不明确。我们探索了一个替代视角：锐度如何与神经表征的几何结构相关联，特别是表征压缩——定义为在局部输入扰动下神经激活的集中程度。我们引入了三个度量指标：局部体积比（LVR）、最大局部敏感度（MLS）和局部维度，并通过理论推导证明这些指标受锐度的数学约束：更平坦的极小值必然限制压缩程度。我们将这些约束推广至重参数化不变的锐度度量，并提出了网络级变体指标（NMLS、NVR），这些指标提供了比先前单层分析更严格、更稳定的约束边界。通过实验，我们在前馈网络、卷积网络和Transformer架构中验证了一致的正相关关系。我们的结果表明，锐度从根本上量化了表征压缩程度，为关于锐度与泛化关系的矛盾发现提供了一个原理性解释。

0

相关内容

【斯坦福博士论文】凸神经网络，Convex neural networks，228页pdf

【斯坦福博士论文】凸神经网络，Convex neural networks，228页pdf

专知会员服务

53+阅读 · 2023年11月19日

Nature. Mach. Intell. |基于梯度的学习通过平衡压缩和扩展来驱动循环神经网络中的鲁棒表示

Nature. Mach. Intell. |基于梯度的学习通过平衡压缩和扩展来驱动循环神经网络中的鲁棒表示

专知会员服务

10+阅读 · 2022年6月23日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

84+阅读 · 2022年3月19日

【牛津大学Michael Bronstein教授】超越Weisfeiler-Lehman和普通信息传递的图神经网络，Graph Neural Networks beyond Weisfeiler-Lehman and vanilla Message Passing

【牛津大学Michael Bronstein教授】超越Weisfeiler-Lehman和普通信息传递的图神经网络，Graph Neural Networks beyond Weisfeiler-Lehman and vanilla Message Passing

专知会员服务

30+阅读 · 2022年3月4日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

专知会员服务

116+阅读 · 2019年12月30日

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

专知会员服务

10+阅读 · 2019年12月5日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

一文读懂线性回归、岭回归和Lasso回归

一文读懂线性回归、岭回归和Lasso回归

CSDN

34+阅读 · 2019年10月13日

从信息论的角度来理解损失函数

从信息论的角度来理解损失函数

深度学习每日摘要

17+阅读 · 2019年4月7日

换个角度看GAN：另一种损失函数

换个角度看GAN：另一种损失函数

机器之心

16+阅读 · 2019年1月1日

图神经网络最近这么火，不妨看看我们精选的这七篇

图神经网络最近这么火，不妨看看我们精选的这七篇

人工智能前沿讲习班

37+阅读 · 2018年12月10日

详解常见的损失函数

详解常见的损失函数

七月在线实验室

20+阅读 · 2018年7月12日

直白介绍卷积神经网络（CNN）

直白介绍卷积神经网络（CNN）

Python开发者

25+阅读 · 2018年4月8日

【论文读书笔记】重新考虑用简单神经网络进行知识表示学习（附代码）

【论文读书笔记】重新考虑用简单神经网络进行知识表示学习（附代码）

专知

14+阅读 · 2018年2月4日

孪生网络实现小数据学习！看神经网络如何找出两张图片的相似点

孪生网络实现小数据学习！看神经网络如何找出两张图片的相似点

机器人圈

35+阅读 · 2017年7月18日

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

结构化压缩感知及其在盲信号处理中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

考虑空气可压缩特性的膜结构-空气耦合振动机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Snk-SPAR通路介导微波辐射后树突棘可塑性异常的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

一种新的平滑肌收缩机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非线性压缩感知问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

知觉学习影响视觉刺激显著性的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

碰撞接触中的尺度缩放效应

国家自然科学基金

0+阅读 · 2014年12月31日

经皮电刺激对失神经支配后肌纤维微结构及力学性能影响的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Semantic Identity Compression: Zero-Error Laws, Rate-Distortion, and Neurosymbolic Necessity

Arxiv

0+阅读 · 4月30日

Rethinking Intrinsic Dimension Estimation in Neural Representations

Arxiv

0+阅读 · 4月22日

Geometric Stability: The Missing Axis of Representations

Arxiv

0+阅读 · 4月20日

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

Arxiv

0+阅读 · 4月10日

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

Arxiv

0+阅读 · 4月3日

Neural collapse in the orthoplex regime

Arxiv

0+阅读 · 3月21日

Rate-Distortion Signatures of Generalization and Information Trade-offs

Arxiv

0+阅读 · 3月2日

Estimating Dimensionality of Neural Representations from Finite Samples

Arxiv

0+阅读 · 3月2日

Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification

Arxiv

0+阅读 · 2月20日

Convergence of gradient descent for deep neural networks

Arxiv

0+阅读 · 2月20日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | SARDI：扩散语言模型的自增强检索

ICML 2026 | SARDI：扩散语言模型的自增强检索

专知会员服务

0+阅读 · 今天14:33

长时程具身智能安全综述：机器人操作的跨层分析

长时程具身智能安全综述：机器人操作的跨层分析

专知会员服务

0+阅读 · 今天14:30

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

专知会员服务

4+阅读 · 今天14:07

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

专知会员服务

2+阅读 · 今天13:59

《国防领域安全采用大语言模型的战略蓝图》

《国防领域安全采用大语言模型的战略蓝图》

专知会员服务

2+阅读 · 今天13:55

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

《对抗性电磁环境下远程巡飞弹作战的保密指挥控制数据链》

专知会员服务

1+阅读 · 今天13:52

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

CVPR2026奖项公布，谷歌D4RT最佳论文获奖，何恺明ResNet、YOLO获时间检验奖！

专知会员服务

1+阅读 · 今天1:50

ICML 2026 | 演化选择的因果建模

ICML 2026 | 演化选择的因果建模

专知会员服务

4+阅读 · 6月5日

综述｜学习式3D表征最新进展与趋势

综述｜学习式3D表征最新进展与趋势

专知会员服务

4+阅读 · 6月5日

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

《武器作战效能分析：基于虚拟构造仿真大数据与深度学习的初步见解》

专知会员服务

6+阅读 · 6月5日

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

《自主巡飞弹药系统量子逻辑框架：一种基于不确定模糊集的方法》

专知会员服务

6+阅读 · 6月5日

人工智能重塑威慑：算法优势的兴起

人工智能重塑威慑：算法优势的兴起

专知会员服务

7+阅读 · 6月5日

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

【博士论文】基于物理结构与贝叶斯不确定性的可靠神经网络

专知会员服务

13+阅读 · 6月4日

AgentOps综述：智能体系统运维框架

AgentOps综述：智能体系统运维框架

专知会员服务

16+阅读 · 6月4日

《美陆军最新条令：兵力防护》

《美陆军最新条令：兵力防护》

专知会员服务

13+阅读 · 6月4日

相关VIP内容

【斯坦福博士论文】凸神经网络，Convex neural networks，228页pdf

【斯坦福博士论文】凸神经网络，Convex neural networks，228页pdf

专知会员服务

53+阅读 · 2023年11月19日

Nature. Mach. Intell. |基于梯度的学习通过平衡压缩和扩展来驱动循环神经网络中的鲁棒表示

Nature. Mach. Intell. |基于梯度的学习通过平衡压缩和扩展来驱动循环神经网络中的鲁棒表示

专知会员服务

10+阅读 · 2022年6月23日

机器学习损失函数概述，Loss Functions in Machine Learning

机器学习损失函数概述，Loss Functions in Machine Learning

专知会员服务

84+阅读 · 2022年3月19日

【牛津大学Michael Bronstein教授】超越Weisfeiler-Lehman和普通信息传递的图神经网络，Graph Neural Networks beyond Weisfeiler-Lehman and vanilla Message Passing

【牛津大学Michael Bronstein教授】超越Weisfeiler-Lehman和普通信息传递的图神经网络，Graph Neural Networks beyond Weisfeiler-Lehman and vanilla Message Passing

专知会员服务

30+阅读 · 2022年3月4日

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

神经网络的拓扑结构，TOPOLOGY OF DEEP NEURAL NETWORKS

专知会员服务

35+阅读 · 2020年4月15日

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

【ICLR2020】深度神经网络优化轨迹的平衡点，The Break-Even Point on Optimization Trajectories of Deep Neural Networks

专知会员服务

34+阅读 · 2020年2月27日

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

【MIT】图神经网络的泛化与表示极限，《Generalization and Representational Limits of Graph Neural Networks》

专知会员服务

46+阅读 · 2020年2月23日

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

【论文】双曲图卷积神经网络（Hyperbolic Graph Convolutional Neural Networks），斯坦福大学| Ines Chami，斯坦福大学| Rex Ying

专知会员服务

116+阅读 · 2019年12月30日

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

【推荐论文】随机加权神经网络中隐藏着什么? （What’s Hidden in a Randomly Weighted Neural Network?），Vivek Ramanujan、Mitchell Wortsman、Aniruddha Kembhavi

专知会员服务

10+阅读 · 2019年12月5日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

热门VIP内容

开通专知VIP会员享更多权益服务

长时程具身智能安全综述：机器人操作的跨层分析

《锻造军官能力：军官发展的军事训练、学术教育及设计思维导向创新的多维度研究》最新300页

ICML 2026 | SARDI：扩散语言模型的自增强检索

从“杀伤链”到“杀伤网”：新时代防空反导体系的真正需求

相关资讯

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

一文读懂线性回归、岭回归和Lasso回归

一文读懂线性回归、岭回归和Lasso回归

CSDN

34+阅读 · 2019年10月13日

从信息论的角度来理解损失函数

从信息论的角度来理解损失函数

深度学习每日摘要

17+阅读 · 2019年4月7日

换个角度看GAN：另一种损失函数

换个角度看GAN：另一种损失函数

机器之心

16+阅读 · 2019年1月1日

图神经网络最近这么火，不妨看看我们精选的这七篇

图神经网络最近这么火，不妨看看我们精选的这七篇

人工智能前沿讲习班

37+阅读 · 2018年12月10日

详解常见的损失函数

详解常见的损失函数

七月在线实验室

20+阅读 · 2018年7月12日

直白介绍卷积神经网络（CNN）

直白介绍卷积神经网络（CNN）

Python开发者

25+阅读 · 2018年4月8日

【论文读书笔记】重新考虑用简单神经网络进行知识表示学习（附代码）

【论文读书笔记】重新考虑用简单神经网络进行知识表示学习（附代码）

专知

14+阅读 · 2018年2月4日

孪生网络实现小数据学习！看神经网络如何找出两张图片的相似点

孪生网络实现小数据学习！看神经网络如何找出两张图片的相似点

机器人圈

35+阅读 · 2017年7月18日

相关论文

Semantic Identity Compression: Zero-Error Laws, Rate-Distortion, and Neurosymbolic Necessity

Arxiv

0+阅读 · 4月30日

Rethinking Intrinsic Dimension Estimation in Neural Representations

Arxiv

0+阅读 · 4月22日

Geometric Stability: The Missing Axis of Representations

Arxiv

0+阅读 · 4月20日

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

Arxiv

0+阅读 · 4月10日

Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

Arxiv

0+阅读 · 4月3日

Neural collapse in the orthoplex regime

Arxiv

0+阅读 · 3月21日

Rate-Distortion Signatures of Generalization and Information Trade-offs

Arxiv

0+阅读 · 3月2日

Estimating Dimensionality of Neural Representations from Finite Samples

Arxiv

0+阅读 · 3月2日

Expressiveness of Multi-Neuron Convex Relaxations in Neural Network Certification

Arxiv

0+阅读 · 2月20日

Convergence of gradient descent for deep neural networks

Arxiv

0+阅读 · 2月20日

相关基金

mPFC神经环路中突触结构重塑与慢性应激大鼠抑郁样行为的关系研究

国家自然科学基金

0+阅读 · 2015年12月31日

结构化压缩感知及其在盲信号处理中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

考虑空气可压缩特性的膜结构-空气耦合振动机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

Snk-SPAR通路介导微波辐射后树突棘可塑性异常的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

一种新的平滑肌收缩机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

非线性压缩感知问题研究

国家自然科学基金

3+阅读 · 2015年12月31日

知觉学习影响视觉刺激显著性的神经机制

国家自然科学基金

1+阅读 · 2015年12月31日

压缩感知与稀疏信号恢复

国家自然科学基金

2+阅读 · 2014年12月31日

碰撞接触中的尺度缩放效应

国家自然科学基金

0+阅读 · 2014年12月31日

经皮电刺激对失神经支配后肌纤维微结构及力学性能影响的研究

国家自然科学基金

0+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员