Recent studies have experimentally shown that we can achieve in non-Euclidean metric space effective and efficient graph embedding, which aims to obtain the vertices' representations reflecting the graph's structure in the metric space. Specifically, graph embedding in hyperbolic space has experimentally succeeded in embedding graphs with hierarchical-tree structure, e.g., data in natural languages, social networks, and knowledge bases. However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance. It implies that the existing bound cannot guarantee the success of graph embedding in non-Euclidean metric space in a practical training data size, which can prevent non-Euclidean graph embedding's application in real problems. This paper provides a novel upper bound of graph embedding's generalization error by evaluating the local Rademacher complexity of the model as a function set of the distances of representation couples. Our bound clarifies that the performance of graph embedding in non-Euclidean metric space, including hyperbolic space, is better than the existing upper bounds suggest. Specifically, our new upper bound is polynomial in the metric space's geometric radius $R$ and can be $O(\frac{1}{S})$ at the fastest, where $S$ is the training data size. Our bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(\frac{1}{\sqrt{S}})$ at the fastest. Specific calculations on example cases show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested.
翻译:近期研究通过实验证明,在非欧几里得度量空间中可以实现高效且有效的图嵌入,其目标是在度量空间中获得反映图结构的顶点表示。具体而言,双曲空间中的图嵌入已成功应用于具有分层树状结构的数据,例如自然语言、社交网络和知识库中的数据。然而,近期理论分析表明,非欧几里得图嵌入的泛化误差上界远高于欧几里得图嵌入的上界,其中高泛化误差意味着数据的不完整性和噪声可能显著损害学习性能。这表明现有边界无法保证非欧几里得度量空间中的图嵌入在实际训练数据规模下取得成功,从而阻碍了非欧几里得图嵌入在真实问题中的应用。本文通过将模型视为表示对距离的函数集,评估其局部Rademacher复杂度,提出了图嵌入泛化误差的新上界。我们的边界表明,非欧几里得度量空间(包括双曲空间)中图嵌入的性能优于现有上界所暗示的结果。具体而言,我们的新上界是度量空间几何半径$R$的多项式函数,最快可达$O(\frac{1}{S})$,其中$S$为训练数据规模。与现有上界相比,我们的边界更紧致且收敛更快(现有上界对$R$呈指数增长,最快为$O(\frac{1}{\sqrt{S}})$)。示例案例的具体计算表明,非欧几里得度量空间中的图嵌入能够在远小于现有边界所要求的训练数据规模下,取得优于欧几里得空间的性能。