Occam Gradient Descent - 专知论文

Deep learning neural network models must be large enough to adapt to their problem domain, while small enough to avoid overfitting training data during gradient descent. To balance these competing demands, overprovisioned deep learning models such as transformers are trained for a single epoch on large data sets, and hence inefficient with both computing resources and training data. In response to these inefficiencies, we exploit learning theory to derive Occam Gradient Descent, an algorithm that interleaves adaptive reduction of model size to minimize generalization error, with gradient descent on model weights to minimize fitting error. In contrast, traditional gradient descent greedily minimizes fitting error without regard to generalization error. Our algorithm simultaneously descends the space of weights and topological size of any neural network without modification. With respect to loss, compute and model size, our experiments show (a) on image classification benchmarks, linear and convolutional neural networks trained with Occam Gradient Descent outperform traditional gradient descent with or without post-train pruning; (b) on a range of tabular data classification tasks, neural networks trained with Occam Gradient Descent outperform traditional gradient descent, as well as Random Forests; (c) on natural language transformers, Occam Gradient Descent outperforms traditional gradient descent.

翻译：深度学习神经网络模型必须足够大以适应其问题领域，同时足够小以避免在梯度下降过程中过度拟合训练数据。为平衡这些相互竞争的需求，过度配置的深度学习模型（如Transformer）通常在大型数据集上仅训练单个轮次，从而导致计算资源和训练数据的双重低效利用。针对这些低效问题，我们基于学习理论推导出奥卡姆梯度下降法——该算法通过交替执行模型规模的自适应缩减（以最小化泛化误差）和模型权重的梯度下降（以最小化拟合误差）来实现优化。相比之下，传统梯度下降法仅贪婪地最小化拟合误差而忽略泛化误差。我们的算法能够在不修改网络结构的前提下，同步优化任意神经网络的权重空间与拓扑规模。在损失函数、计算开销和模型规模方面的实验表明：（a）在图像分类基准测试中，采用奥卡姆梯度下降法训练的线性与卷积神经网络，其性能优于传统梯度下降法（无论是否进行训练后剪枝）；（b）在一系列表格数据分类任务中，采用奥卡姆梯度下降法训练的神经网络，其性能超越传统梯度下降法及随机森林算法；（c）在自然语言Transformer模型中，奥卡姆梯度下降法同样优于传统梯度下降法。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日