Hierarchical Object-Centric Learning with Capsule Networks

Capsule networks (CapsNets) were introduced to address convolutional neural networks limitations, learning object-centric representations that are more robust, pose-aware, and interpretable. They organize neurons into groups called capsules, where each capsule encodes the instantiation parameters of an object or one of its parts. Moreover, a routing algorithm connects capsules in different layers, thereby capturing hierarchical part-whole relationships in the data. This thesis investigates the intriguing aspects of CapsNets and focuses on three key questions to unlock their full potential. First, we explore the effectiveness of the routing algorithm, particularly in small-sized networks. We propose a novel method that anneals the number of routing iterations during training, enhancing performance in architectures with fewer parameters. Secondly, we investigate methods to extract more effective first-layer capsules, also known as primary capsules. By exploiting pruned backbones, we aim to improve computational efficiency by reducing the number of capsules while achieving high generalization. This approach reduces CapsNets memory requirements and computational effort. Third, we explore part-relationship learning in CapsNets. Through extensive research, we demonstrate that capsules with low entropy can extract more concise and discriminative part-whole relationships compared to traditional capsule networks, even with reasonable network sizes. Lastly, we showcase how CapsNets can be utilized in real-world applications, including autonomous localization of unmanned aerial vehicles, quaternion-based rotations prediction in synthetic datasets, and lung nodule segmentation in biomedical imaging. The findings presented in this thesis contribute to a deeper understanding of CapsNets and highlight their potential to address complex computer vision challenges.

翻译：胶囊网络（CapsNets）的提出旨在解决卷积神经网络的局限性，通过学习更具鲁棒性、姿态感知性和可解释性的物体中心表示。该网络将神经元组织成称为“胶囊”的组群，其中每个胶囊编码一个物体或其某部分的实例化参数。此外，路由算法连接不同层中的胶囊，从而捕捉数据中分层的部分-整体关系。本论文研究了胶囊网络的若干关键特性，并围绕三个核心问题展开，以充分释放其潜力。首先，我们探究了路由算法的有效性，特别是在小规模网络中。我们提出了一种在训练过程中退火路由迭代次数的新方法，从而在参数较少的架构中提升了性能。其次，我们研究了提取更有效的首层胶囊（亦称主胶囊）的方法。通过利用剪枝后的骨干网络，我们的目标是在减少胶囊数量的同时保持高泛化能力，从而提高计算效率。该方法降低了胶囊网络的内存需求和计算开销。第三，我们探索了胶囊网络中的部分关系学习。通过大量实验，我们证明即使网络规模合理，低熵胶囊相比传统胶囊网络能够提取更简洁、更具判别力的部分-整体关系。最后，我们展示了胶囊网络在现实场景中的应用，包括无人机的自主定位、合成数据集中基于四元数的旋转预测，以及生物医学影像中的肺结节分割。本论文的研究成果有助于深化对胶囊网络的理解，并凸显了其在应对复杂计算机视觉挑战方面的潜力。