Capsule Networks outperform Convolutional Neural Networks in learning the part-whole relationships with viewpoint invariance, and the credit goes to their multidimensional capsules. It was assumed that increasing the number of capsule layers in the capsule networks would enhance the model performance. However, recent studies found that Capsule Networks lack scalability due to vanishing activations in the capsules of deeper layers. This paper thoroughly investigates the vanishing activation problem in deep Capsule Networks. To analyze this issue and understand how increasing capsule dimensions can facilitate deeper networks, various Capsule Network models are constructed and evaluated with different numbers of capsules, capsule dimensions, and intermediate layers for this paper. Unlike traditional model pruning, which reduces the number of model parameters and expedites model training, this study uses pruning to mitigate the vanishing activations in the deeper capsule layers. In addition, the backbone network and capsule layers are pruned with different pruning ratios to reduce the number of inactive capsules and achieve better model accuracy than the unpruned models.
翻译:胶囊网络在学习具有视角不变性的部分-整体关系方面优于卷积神经网络,这归功于其多维胶囊结构。传统观点认为增加胶囊网络中的胶囊层数能够提升模型性能。然而,近期研究发现胶囊网络因深层胶囊中的激活消失现象而缺乏可扩展性。本文系统研究了深度胶囊网络中的激活消失问题。为分析该问题并探究增加胶囊维度如何促进更深层网络的构建,本研究构建了多种具有不同胶囊数量、胶囊维度和中间层的胶囊网络模型进行评估。与传统通过减少模型参数以加速训练过程的模型剪枝不同,本研究采用剪枝技术来缓解深层胶囊中的激活消失现象。此外,通过以不同剪枝比例对骨干网络和胶囊层进行剪枝,有效减少了非活跃胶囊数量,最终获得了优于未剪枝模型的准确率。