Global $\mathcal{L}^2$ minimization at uniform exponential rate via geometrically adapted gradient descent in Deep Learning

We consider the scenario of supervised learning in Deep Learning (DL) networks, and exploit the arbitrariness of choice in the Riemannian metric relative to which the gradient descent flow can be defined (a general fact of differential geometry). In the standard approach to DL, the gradient flow on the space of parameters (weights and biases) is defined with respect to the Euclidean metric. Here instead, we choose the gradient flow with respect to the Euclidean metric in the output layer of the DL network. This naturally induces two modified versions of the gradient descent flow in the parameter space, one adapted for the overparametrized setting, and the other for the underparametrized setting. In the overparametrized case, we prove that, provided that a rank condition holds, all orbits of the modified gradient descent drive the ${\mathcal L}^2$ cost to its global minimum at a uniform exponential convergence rate; one thereby obtains an a priori stopping time for any prescribed proximity to the global minimum. We point out relations of the latter to sub-Riemannian geometry. Moreover, we generalize the above framework to the situation in which the rank condition does not hold; in particular, we show that local equilibria can only exist if a rank loss occurs, and that generically, they are not isolated points, but elements of a critical submanifold of parameter space.

翻译：我们考虑深度学习（DL）网络中的监督学习场景，并利用黎曼度量选择的任意性来定义梯度下降流（这是微分几何的一个普遍事实）。在DL的标准方法中，参数空间（权重和偏置）上的梯度流是基于欧几里得度量定义的。本文则选择基于DL网络输出层欧几里得度量的梯度流。这自然引出了参数空间中梯度下降流的两种修正版本：一种适用于过参数化情形，另一种适用于欠参数化情形。在过参数化情况下，我们证明：若秩条件成立，修正梯度下降的所有轨道均以均匀指数收敛速率将${\mathcal L}^2$代价驱动至其全局最小值；由此可预先确定任意指定逼近全局最小值的终止时间。我们指出该方法与子黎曼几何的联系。此外，我们将上述框架推广至秩条件不成立的情形：特别地，我们证明局部平衡态仅当秩损失发生时存在，且一般而言，这些平衡态并非孤立点，而是参数空间中临界子流形的元素。

相关内容

UniFormer

关注 1

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

ExBert — 可视化分析Transformer学到的表示

专知会员服务

32+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Github项目推荐 | 股市预测的机器学习/深度学习模型/资源集锦

AI研习社

33+阅读 · 2019年4月18日

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

论文浅尝 | 嵌入常识知识的注意力 LSTM 模型用于特定目标的基于侧面的情感分析

开放知识图谱

28+阅读 · 2018年6月11日

Focal Loss for Dense Object Detection

统计学习与视觉计算组

12+阅读 · 2018年3月15日

可解释的CNN

CreateAMind

18+阅读 · 2017年10月5日

IJCAI | Cascade Dynamics Modeling with Attention-based RNN

KingsGarden

13+阅读 · 2017年7月16日

Layer Normalization原理及其TensorFlow实现

深度学习每日摘要

32+阅读 · 2017年6月17日

Hamilton-Jacibi方程的弱KAM理论

国家自然科学基金

2+阅读 · 2017年12月31日

Musielak-Orlicz-Sobolev 空间中的迹嵌入及其应用

国家自然科学基金

2+阅读 · 2015年12月31日

Volterra积分微分方程的多区间Chebyshev和Legendre谱配置法

国家自然科学基金

0+阅读 · 2015年12月31日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

复多项式的核拓扑熵

国家自然科学基金

0+阅读 · 2015年12月31日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

哈密尔顿系统及多体问题的周期解

国家自然科学基金

0+阅读 · 2014年12月31日

动态Gr？bner 基与GVW算法

国家自然科学基金

0+阅读 · 2014年12月31日

Poisson流形上的修正Hamilton方法

国家自然科学基金

0+阅读 · 2014年12月31日

仿拓扑群中的三空间问题

国家自然科学基金

0+阅读 · 2014年12月31日

Uniform $\mathcal{H}$-matrix Compression with Applications to Boundary Integral Equations

Arxiv

0+阅读 · 2024年5月24日

Invariant subspaces and PCA in nearly matrix multiplication time

Arxiv

0+阅读 · 2024年5月24日

Learning to optimize: A tutorial for continuous and mixed-integer optimization

Arxiv

0+阅读 · 2024年5月24日

$Novel $H^\mathrm{dev}(\mathrm{Curl})$-conforming elements on regular triangulations and Clough--Tocher splits for the planar relaxed micromorphic model$

Novel $H^\mathrm{dev}(\mathrm{Curl})$-conforming elements on regular triangulations and Clough--Tocher splits for the planar relaxed micromorphic model

Arxiv

0+阅读 · 2024年5月23日

Monoidal bicategories, differential linear logic, and analytic functors

Arxiv

0+阅读 · 2024年5月23日

Learning heavy-tailed distributions with Wasserstein-proximal-regularized $α$-divergences

Arxiv

0+阅读 · 2024年5月22日

An $\ell^1$-Plug-and-Play Approach for MPI Using a Zero Shot Denoiser with Evaluation on the 3D Open MPI Dataset

Arxiv

0+阅读 · 2024年5月22日