An ultrametric space or infinity-metric space is defined by a dissimilarity function that satisfies a strong triangle inequality in which every side of a triangle is not larger than the larger of the other two. We show that search in ultrametric spaces with a vantage point tree has worst-case complexity equal to the depth of the tree. Since datasets of interest are not ultrametric in general, we employ a projection operator that transforms an arbitrary dissimilarity function into an ultrametric space while preserving nearest neighbors. We further learn an approximation of this projection operator to efficiently compute ultrametric distances between query points and points in the dataset. We proceed to solve a more general problem in which we consider projections in $q$-metric spaces -- in which triangle sides raised to the power of $q$ are smaller than the sum of the $q$-powers of the other two. Notice that the use of learned approximations of projected $q$-metric distances renders the search pipeline approximate. We show in experiments that increasing values of $q$ result in faster search but lower recall. Overall, search in q-metric and infinity metric spaces is competitive with existing search methods.
翻译:超度量空间(又称无穷范数度量空间)由满足强三角不等式的相异性函数定义,其中三角形的任意边长不大于其余两边中的较大者。我们证明了在超度量空间中使用视点树进行搜索时,其最坏情况复杂度等于树的深度。由于实际数据集通常不具备超度量特性,我们采用投影算子将任意相异性函数转换为能保持最近邻关系的超度量空间。我们进一步学习该投影算子的近似形式,以高效计算查询点与数据点之间的超度量距离。随后我们拓展至更一般化的问题,考虑在$q$-度量空间中的投影——该空间要求三角形边长$q$次幂小于其余两边$q$次幂之和。需要指出的是,采用投影$q$-度量距离的近似学习会使搜索流程变为近似搜索。实验表明,增大$q$值会提升搜索速度但降低召回率。总体而言,在$q$-度量空间与无穷范数度量空间中的搜索性能与现有搜索方法具有竞争力。