We present Differentiable Neural Architectures (DNArch), a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation. In particular, DNArch allows learning (i) the size of convolutional kernels at each layer, (ii) the number of channels at each layer, (iii) the position and values of downsampling layers, and (iv) the depth of the network. To this end, DNArch views neural architectures as continuous multidimensional entities, and uses learnable differentiable masks along each dimension to control their size. Unlike existing methods, DNArch is not limited to a predefined set of possible neural components, but instead it is able to discover entire CNN architectures across all feasible combinations of kernel sizes, widths, depths and downsampling. Empirically, DNArch finds performant CNN architectures for several classification and dense prediction tasks on sequential and image data. When combined with a loss term that controls the network complexity, DNArch constrains its search to architectures that respect a predefined computational budget during training.
翻译:我们提出可微分神经架构(DNArch),这是一种通过反向传播联合学习卷积神经网络(CNN)的权重与架构的方法。具体而言,DNArch能够学习:(i)每层卷积核的大小,(ii)每层的通道数,(iii)下采样层的位置与数值,以及(iv)网络的深度。为此,DNArch将神经架构视为连续的多维实体,并利用沿各维度的可学习可微分掩码来控制其规模。与现有方法不同,DNArch不局限于预定义的神经组件集合,而是能在所有可行的卷积核尺寸、宽度、深度及下采样组合中探索并发现完整的CNN架构。实验表明,DNArch能够在序列数据与图像数据的多个分类及密集预测任务中找到性能优异的CNN架构。当结合控制网络复杂度的损失项时,DNArch可在训练过程中将其搜索范围限制在符合预设计算预算的架构内。