Training Overparametrized Neural Networks in Sublinear Time

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI). Despite the popularity and low cost-per-iteration of traditional backpropagation via gradient decent, stochastic gradient descent (SGD) has prohibitive convergence rate in non-convex settings, both in theory and practice. To mitigate this cost, recent works have proposed to employ alternative (Newton-type) training methods with much faster convergence rate, albeit with higher cost-per-iteration. For a typical neural network with $m=\mathrm{poly}(n)$ parameters and input batch of $n$ datapoints in $\mathbb{R}^d$, the previous work of [Brand, Peng, Song, and Weinstein, ITCS'2021] requires $\sim mnd + n^3$ time per iteration. In this paper, we present a novel training method that requires only $m^{1-\alpha} n d + n^3$ amortized time in the same overparametrized regime, where $\alpha \in (0.01,1)$ is some fixed constant. This method relies on a new and alternative view of neural networks, as a set of binary search trees, where each iteration corresponds to modifying a small subset of the nodes in the tree. We believe this view would have further applications in the design and analysis of deep neural networks (DNNs).

翻译：深度学习的成功伴随着巨大的计算和能源成本，训练大规模超参数化神经网络的可扩展性正成为人工智能发展的现实障碍。尽管传统的基于梯度下降的反向传播方法广受欢迎且单次迭代成本低廉，但在非凸场景下，随机梯度下降（SGD）的理论与实践收敛速度均不理想。为降低成本，近期研究提出采用牛顿类替代训练方法，其收敛速度更快，但单次迭代成本更高。针对具有 $m=\mathrm{poly}(n)$ 个参数且输入批量包含 $\mathbb{R}^d$ 中 $n$ 个数据点的典型神经网络，[Brand, Peng, Song, and Weinstein, ITCS'2021] 的先前工作需要每次迭代 $\sim mnd + n^3$ 时间。本文提出一种新型训练方法，在相同超参数化场景下仅需 $m^{1-\alpha} n d + n^3$ 均摊时间，其中 $\alpha \in (0.01,1)$ 为固定常数。该方法基于对神经网络的全新替代视角——将其视为一组二叉搜索树，每次迭代对应修改树中少量节点。我们相信这一视角将在深度神经网络（DNN）的设计与分析中产生更多应用。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日