The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only its size, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the ``rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.
翻译:训练神经网络的结果高度依赖于所选择的架构;即便仅对其规模进行微小的改动,通常也需要重新开始训练过程。与此相反,我们从一个小型架构开始训练,仅在问题需要时增加其容量,并在这一过程中避免干扰之前的优化。为此,我们引入了一种基于自然梯度的方法,该方法在可能显著减少假设的收敛训练损失时,直观地扩展神经网络的宽度和深度。我们证明了神经元添加“速率”的上界,以及扩展分数的计算成本较低的下界。我们通过全连接和卷积的自扩展神经网络在分类和回归问题中展示了其优势,包括那些先验架构规模高度不确定的问题。