We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as $\exp(Ω(R))$ to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit $O(1)$-Lipschitz realizability, enabling learning with $n = O(mR \log m)$ samples. A matching $Ω(mR \log m)$ lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-$k$ prediction space captures only $O(k)$ canonical hierarchical contrasts.
翻译:在标准Lipschitz正则化条件下,我们证明了针对层次数据学习时,欧几里得表示与双曲表示在样本复杂度上存在指数级差异。对于分支因子为$m$、深度为$R$的层次结构,我们首先建立了欧几里得空间的几何障碍:任何有界半径嵌入都会强制产生体积塌缩,将指数级数量的树距离较远的点映射到相邻位置。这使得即使要实现简单的层次目标,也需要Lipschitz常数按$\exp(Ω(R))$缩放,从而在容量控制下产生指数级样本复杂度。随后我们证明该障碍在双曲空间中消失:恒定失真的双曲嵌入允许$O(1)$-Lipschitz可实现性,使得仅需$n = O(mR \log m)$个样本即可完成学习。通过Fano不等式推导出的匹配下界$Ω(mR \log m)$表明,双曲表示达到了信息论最优值。我们还揭示了一个与几何无关的瓶颈:任何秩为$k$的预测空间仅能捕获$O(k)$个典型层次对比特征。