Neural networks are known to give better performance with increased depth due to their ability to learn more abstract features. Although the deepening of networks has been well established, there is still room for efficient feature extraction within a layer which would reduce the need for mere parameter increment. The conventional widening of networks by having more filters in each layer introduces a quadratic increment of parameters. Having multiple parallel convolutional/dense operations in each layer solves this problem, but without any context-dependent allocation of resources among these operations: the parallel computations tend to learn similar features making the widening process less effective. Therefore, we propose the use of multi-path neural networks with data-dependent resource allocation among parallel computations within layers, which also lets an input to be routed end-to-end through these parallel paths. To do this, we first introduce a cross-prediction based algorithm between parallel tensors of subsequent layers. Second, we further reduce the routing overhead by introducing feature-dependent cross-connections between parallel tensors of successive layers. Our multi-path networks show superior performance to existing widening and adaptive feature extraction, and even ensembles, and deeper networks at similar complexity in the image recognition task.
翻译:已知神经网络通过增加深度能够学习更抽象的特征,从而获得更优性能。尽管网络深化技术已趋成熟,但层内高效特征提取仍有改进空间,这可以减少单纯参数递增的需求。传统网络通过每层增加滤波器数量实现宽度扩展,这会导致参数呈二次方增长。在每层中设置多个并行卷积/全连接操作虽能解决该问题,但这些操作间缺乏基于上下文的自适应资源分配:并行计算单元倾向于学习相似特征,致使宽度扩展效果降低。因此,我们提出在层内并行计算单元间采用数据依赖的资源分配机制,并允许输入通过这些并行路径实现端到端路由。具体而言,首先引入基于交叉预测的相邻层并行张量间协同算法;其次,通过建立相邻层并行张量间的特征依赖交叉连接,进一步降低路由开销。实验表明,在图像识别任务中,我们的多路径网络在相似复杂度条件下,性能优于现有宽度扩展方法、自适应特征提取方法,甚至集成方法及更深层网络。