We present Differentiable Neural Architectures (DNArch), a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation. In particular, DNArch allows learning (i) the size of convolutional kernels at each layer, (ii) the number of channels at each layer, (iii) the position and values of downsampling layers, and (iv) the depth of the network. To this end, DNArch views neural architectures as continuous multidimensional entities, and uses learnable differentiable masks along each dimension to control their size. Unlike existing methods, DNArch is not limited to a predefined set of possible neural components, but instead it is able to discover entire CNN architectures across all combinations of kernel sizes, widths, depths and downsampling. Empirically, DNArch finds performant CNN architectures for several classification and dense prediction tasks on both sequential and image data. When combined with a loss term that considers the network complexity, DNArch finds powerful architectures that respect a predefined computational budget.
翻译:我们提出可微神经架构(DNArch),这是一种通过反向传播联合学习卷积神经网络(CNN)权重与架构的方法。具体而言,DNArch能够学习:(i)各层卷积核的尺寸,(ii)各层的通道数,(iii)下采样层的位置与数值,以及(iv)网络的深度。为此,DNArch将神经架构视为连续的多维实体,沿每个维度使用可学习的可微掩码来控制其规模。与现有方法不同,DNArch不限于预定义的神经网络组件集合,而是能够通过卷积核大小、宽度、深度和下采样的所有组合发现完整的CNN架构。实验表明,DNArch能够为序列数据和图像数据的多种分类任务与密集预测任务找到高性能的CNN架构。当与考虑网络复杂度的损失项结合时,DNArch能够在满足预计算力预算的条件下找到强大的架构。