Conditional computing processes an input using only part of the neural network's computational units. Learning to execute parts of a deep convolutional network by routing individual samples has several advantages: Reducing the computational burden is an obvious advantage. Furthermore, if similar classes are routed to the same path, that part of the network learns to discriminate between finer differences and better classification accuracies can be attained with fewer parameters. Recently, several papers have exploited this idea to take a particular child of a node in a tree-shaped network or to skip parts of a network. In this work, we follow a Trellis-based approach for generating specific execution paths in a deep convolutional neural network. We have designed routing mechanisms that use differentiable information gain-based cost functions to determine which subset of features in a convolutional layer will be executed. We call our method Conditional Information Gain Trellis (CIGT). We show that our conditional execution mechanism achieves comparable or better model performance compared to unconditional baselines, using only a fraction of the computational resources.
翻译:条件计算仅使用神经网络的部分计算单元处理输入。通过学习为单个样本路由来执行深度卷积网络的部分计算具有多重优势:降低计算负担是显而易见的优势。此外,若将相似类别路由至相同路径,网络对应部分可学习区分更细微的差异,从而以更少参数获得更优分类精度。近期多篇论文基于此思想,探索了在树形网络中选择特定子节点或跳过网络部分的方法。本研究采用基于网格的方法,在深度卷积神经网络中生成特定执行路径。我们设计了路由机制,利用基于可微分信息增益的代价函数来确定卷积层中哪些特征子集将被执行。该方法被称为条件信息增益网格。实验表明,与无条件基线模型相比,我们的条件执行机制仅需消耗部分计算资源,即可达到相当或更优的模型性能。