Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar\'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar\'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar\'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.
翻译:鉴于双曲空间球体体积随半径呈指数增长,该空间能以任意小失真嵌入树结构,因此在层次化数据集表示领域受到广泛关注。然而,这种指数增长特性导致数值不稳定性,使得双曲学习模型训练时偶尔会出现灾难性NaN问题——在浮点运算中遇到不可表示值。本研究系统分析了双曲空间两种主流模型(庞加莱球与洛伦兹模型)的局限性。首先证明,在64位算术系统下,庞加莱球在正确表示点方面比洛伦兹模型具有更大容量。进而从优化角度理论验证洛伦兹模型相对于庞加莱球的优越性。针对两种模型的数值局限性,我们提出一种可缓解这些局限性的双曲空间欧几里得参数化方法,并将该参数化方法扩展至双曲超平面,展示其提升双曲支持向量机性能的能力。