Real-world visual data exhibit intrinsic hierarchical structures that can be represented effectively in hyperbolic spaces. Hyperbolic neural networks (HNNs) are a promising approach for learning feature representations in such spaces. However, current HNNs in computer vision rely on Euclidean backbones and only project features to the hyperbolic space in the task heads, limiting their ability to fully leverage the benefits of hyperbolic geometry. To address this, we present HCNN, a fully hyperbolic convolutional neural network (CNN) designed for computer vision tasks. Based on the Lorentz model, we generalize fundamental components of CNNs and propose novel formulations of the convolutional layer, batch normalization, and multinomial logistic regression. {Experiments on standard vision tasks demonstrate the promising performance of our HCNN framework in both hybrid and fully hyperbolic settings.} Overall, we believe our contributions provide a foundation for developing more powerful HNNs that can better represent complex structures found in image data. Our code is publicly available at https://github.com/kschwethelm/HyperbolicCV.
翻译:真实世界中的视觉数据展现出固有的层次结构,这些结构能够在双曲空间中得到有效表示。双曲神经网络(HNNs)是学习此类空间特征表示的一种颇具前景的方法。然而,当前计算机视觉领域的双曲神经网络依赖于欧几里得骨干网络,仅在任务头部将特征投影到双曲空间,这限制了其充分利用双曲几何优势的能力。针对此问题,我们提出HCNN——一种专为计算机视觉任务设计的全双曲卷积神经网络(CNN)。基于洛伦兹模型,我们泛化了CNN的基本组件,并提出了卷积层、批归一化及多项逻辑回归的新公式。在标准视觉任务上的实验表明,我们的HCNN框架在混合与全双曲两种设置下均展现出卓越性能。总体而言,我们相信本贡献为开发更强大的能更好地表示图像数据中复杂结构的HNNs奠定了基础。我们的代码已在 https://github.com/kschwethelm/HyperbolicCV 公开。