In this paper, we consider outlier embeddings into HSTs. In particular, for metric $(X,d)$, let $k$ be the size of the smallest subset of $X$ such that all but that subset (the ``outlier set'') can be probabilistically embedded into the space of HSTs with expected distortion at most $c$. Our primary result is showing that there exists an efficient algorithm that takes in $(X,d)$ and a target distortion $c$ and samples from a probabilistic embedding with at most $O(\frac k ε\log^2k)$ outliers and distortion at most $(32+ε)c$, for any $ε>0$. In order to facilitate our results, we show how to find good nested embeddings into HSTs and combine this with an approximation algorithm of Munagala et al. [MST23] to obtain our results.
翻译:本文探讨了度量空间$(X,d)$向层次星树(HST)的离群点嵌入问题。具体而言,令$k$为$X$的最小子集规模,使得除该子集(即“离群点集”)外的所有点都能以期望失真不超过$c$的概率嵌入到HST空间。我们的核心结论是:存在高效算法,可接收度量空间$(X,d)$与目标失真参数$c$,并采样生成一种概率嵌入方案。该方案在任意$ε>0$条件下,最多产生$O(\frac k ε\log^2k)$个离群点,且失真度不超过$(32+ε)c$。为实现该结果,我们首先提出寻找优质嵌套HST嵌入的方法,进而结合Munagala等人[MST23]的近似算法完成最终论证。