Metric embeddings traditionally study how to map $n$ items to a target metric space such that distance lengths are not heavily distorted; but what if we only care to preserve the relative order of the distances (and not their length)? In this paper, we are motivated by the following basic question: given triplet comparisons of the form ``item $i$ is closer to item $j$ than to item $k$,'' can we find low-dimensional Euclidean representations for the $n$ items that respect those distance comparisons? Such order-preserving embeddings naturally arise in important applications and have been studied since the 1950s, under the name of ordinal or non-metric embeddings. Our main results are: 1. Nearly-Tight Bounds on Triplet Dimension: We introduce the natural concept of triplet dimension of a dataset, and surprisingly, we show that in order for an ordinal embedding to be triplet-preserving, its dimension needs to grow as $\frac n2$ in the worst case. This is optimal (up to constant) as $n-1$ dimensions always suffice. 2. Tradeoffs for Dimension vs (Ordinal) Relaxation: We then relax the requirement that every triplet should be exactly preserved and present almost tight lower bounds for the maximum ratio between distances whose relative order was inverted by the embedding; this ratio is known as (ordinal) relaxation in the literature and serves as a counterpart to (metric) distortion. 3. New Bounds on Terminal and Top-$k$-NNs Embeddings: Going beyond triplets, we then study two well-motivated scenarios where we care about preserving specific sets of distances (not necessarily triplets). The first scenario is Terminal Ordinal Embeddings and the second scenario is top-$k$-NNs Ordinal Embeddings. To the best of our knowledge, these are some of the first tradeoffs on triplet-preserving ordinal embeddings and the first study of Terminal and Top-$k$-NNs Ordinal Embeddings.
翻译:度量嵌入传统上研究如何将n个项映射到目标度量空间,使得距离长度不发生严重失真;但如果只关心保持距离的相对顺序(而非长度)呢?本文受以下基本问题驱动:给定形如“项i比项k更接近项j”的三元组比较,能否为n个项找到尊重这些距离比较的低维欧几里得表示?这类保序嵌入在重要应用中自然出现,自20世纪50年代以来便以序数嵌入或非度量嵌入的名义被研究。我们的主要结果如下:1. 三元组维度的近紧界:我们引入数据集三元组维度的自然概念,令人惊讶的是,为实现三元组保持的序数嵌入,其维度在最坏情况下需增长至n/2。这是最优的(至多常数因子),因为n-1维总是足够的。2. 维度与(序数)松弛之间的权衡:我们进而放宽每个三元组必须精确保持的要求,并给出嵌入反转相对顺序距离的最大比率的几乎紧下界;该比率在文献中称为(序数)松弛,作为(度量)失真的对应物。3. 终端与Top-k-最近邻嵌入的新界:超越三元组,我们研究两个动机明确的场景,其中关注保持特定距离集(不一定是三元组)。第一个场景是终端序数嵌入,第二个场景是top-k-最近邻序数嵌入。据我们所知,这些是关于三元组保持序数嵌入的首批权衡结果,也是终端与Top-k-最近邻序数嵌入的首次研究。