Over past few years afterward the birth of ResNet, skip connection has become the defacto standard for the design of modern architectures due to its widespread adoption, easy optimization and proven performance. Prior work has explained the effectiveness of the skip connection mechanism from different perspectives. In this work, we deep dive into the model's behaviors with skip connections which can be formulated as a learnable Markov chain. An efficient Markov chain is preferred as it always maps the input data to the target domain in a better way. However, while a model is explained as a Markov chain, it is not guaranteed to be optimized following an efficient Markov chain by existing SGD-based optimizers which are prone to get trapped in local optimal points. In order to towards a more efficient Markov chain, we propose a simple routine of penal connection to make any residual-like model become a learnable Markov chain. Aside from that, the penal connection can also be viewed as a particular model regularization and can be easily implemented with one line of code in the most popular deep learning frameworks~\footnote{Source code: \url{https://github.com/densechen/penal-connection}}. The encouraging experimental results in multi-modal translation and image recognition empirically confirm our conjecture of the learnable Markov chain view and demonstrate the superiority of the proposed penal connection.
翻译:自ResNet问世以来的几年间,跳跃连接因其广泛适用性、易于优化和已验证的性能,已成为现代架构设计的默认标准。先前的工作已从不同角度解释了跳跃连接机制的有效性。本文深入研究了带有跳跃连接的模型行为,该行为可被形式化为一种可学习的马尔可夫链。高效的马尔可夫链更受青睐,因为它能以更优方式将输入数据映射到目标域。然而,当一个模型被解释为马尔可夫链时,现有基于SGD的优化器(易陷入局部最优点)并不能保证其按照高效马尔可夫链进行优化。为实现更高效的马尔可夫链,我们提出一种简单的罚连接策略,使任何类残差模型成为可学习的马尔可夫链。此外,罚连接还可视为一种特定的模型正则化方法,并且可在主流深度学习框架中用一行代码轻松实现\footnote{源代码:\url{https://github.com/densechen/penal-connection}}。在多模态翻译和图像识别任务中的实验结果,不仅实证性地验证了我们对可学习马尔可夫链视角的推论,还展示了所提罚连接的优越性能。