Given the exponential growth of the volume of the ball w.r.t. its radius, the hyperbolic space is capable of embedding trees with arbitrarily small distortion and hence has received wide attention for representing hierarchical datasets. However, this exponential growth property comes at a price of numerical instability such that training hyperbolic learning models will sometimes lead to catastrophic NaN problems, encountering unrepresentable values in floating point arithmetic. In this work, we carefully analyze the limitation of two popular models for the hyperbolic space, namely, the Poincar\'e ball and the Lorentz model. We first show that, under the 64 bit arithmetic system, the Poincar\'e ball has a relatively larger capacity than the Lorentz model for correctly representing points. Then, we theoretically validate the superiority of the Lorentz model over the Poincar\'e ball from the perspective of optimization. Given the numerical limitations of both models, we identify one Euclidean parametrization of the hyperbolic space which can alleviate these limitations. We further extend this Euclidean parametrization to hyperbolic hyperplanes and exhibits its ability in improving the performance of hyperbolic SVM.
翻译:鉴于球的体积随半径呈指数增长,双曲空间能够以任意小的失真嵌入树结构,因此在表示层次化数据集方面受到广泛关注。然而,这种指数增长特性也带来了数值不稳定的代价,使得训练双曲学习模型有时会出现致命的NaN问题,即浮点算术中遇到无法表示的值。本研究仔细分析了双曲空间两种流行模型——庞加莱球模型与洛伦兹模型的局限性。首先我们证明,在64位算术系统下,庞加莱球模型在正确表示点方面具有比洛伦兹模型更大的容量。接着从优化角度理论验证了洛伦兹模型相较于庞加莱球模型的优越性。针对两种模型存在的数值局限,我们提出一种可缓解这些局限性的双曲空间欧几里得参数化方法,并将其推广至双曲超平面,验证了其在提升双曲支持向量机性能方面的有效性。