Metric embeddings traditionally study how to map $n$ items to a target metric space such that distance lengths are not heavily distorted; but what if we only care to preserve the relative order of the distances (and not their length)? In this paper, we are motivated by the following basic question: given triplet comparisons of the form ``item $i$ is closer to item $j$ than to item $k$,'' can we find low-dimensional Euclidean representations for the $n$ items that respect those distance comparisons? Such order-preserving embeddings naturally arise in important applications and have been studied since the 1950s, under the name of ordinal or non-metric embeddings. Our main results are: 1. Nearly-Tight Bounds on Triplet Dimension: We introduce the natural concept of triplet dimension of a dataset, and surprisingly, we show that in order for an ordinal embedding to be triplet-preserving, its dimension needs to grow as $\frac n2$ in the worst case. This is optimal (up to constant) as $n-1$ dimensions always suffice. 2. Tradeoffs for Dimension vs (Ordinal) Relaxation: We then relax the requirement that every triplet should be exactly preserved and present almost tight lower bounds for the maximum ratio between distances whose relative order was inverted by the embedding; this ratio is known as (ordinal) relaxation in the literature and serves as a counterpart to (metric) distortion. 3. New Bounds on Terminal and Top-$k$-NNs Embeddings: Going beyond triplets, we then study two well-motivated scenarios where we care about preserving specific sets of distances (not necessarily triplets). The first scenario is Terminal Ordinal Embeddings and the second scenario is top-$k$-NNs Ordinal Embeddings. To the best of our knowledge, these are some of the first tradeoffs on triplet-preserving ordinal embeddings and the first study of Terminal and Top-$k$-NNs Ordinal Embeddings.
翻译:度量嵌入传统上研究如何将$n$个项映射到目标度量空间,使得距离长度不发生严重扭曲;但如果我们仅关心保留距离的相对顺序(而非其长度)呢?本文受以下基本问题的驱动:给定形如“项$i$比项$k$更接近项$j$”的三元组比较,我们能否为这$n$个项找到尊重这些距离比较的低维欧几里得表示?这种保序嵌入自然出现在重要应用中,自20世纪50年代以来便以序数或非度量嵌入之名被研究。我们的主要结果包括:1. 三元组维度的近紧界:我们引入了数据集三元组维度的自然概念,令人惊讶的是,我们证明为使序数嵌入保持三元组结构,其维度在最坏情况下需增长至$\frac n2$(在常数因子内最优,因为$n-1$维总是足够的)。2. 维度与(序数)松弛的权衡:随后我们放宽了每个三元组必须精确保留的要求,并给出了嵌入导致相对顺序反转的距离最大比率的几乎严格下界;该比率在文献中被称为(序数)松弛,是(度量)失真的对应概念。3. 终端与Top-$k$-NN嵌入的新界:超越三元组后,我们进一步研究了两种具有良好动机的场景,其中我们关注保留特定的距离集合(不一定是三元组)。第一种场景是终端序数嵌入,第二种场景是Top-$k$-NN序数嵌入。据我们所知,这是关于保三元组序数嵌入的首批权衡结果之一,也是终端与Top-$k$-NN序数嵌入的首次研究。