We study the design of embeddings into Euclidean space with outliers. Given a metric space $(X,d)$ and an integer $k$, the goal is to embed all but $k$ points in $X$ (called the "outliers") into $\ell_2$ with the smallest possible distortion $c$. Finding the optimal distortion $c$ for a given outlier set size $k$, or alternately the smallest $k$ for a given target distortion $c$ are both NP-hard problems. In fact, it is UGC-hard to approximate $k$ to within a factor smaller than $2$ even when the metric sans outliers is isometrically embeddable into $\ell_2$. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an $O(\log^4 k)$ factor and the distortion to within a constant factor. The main technical component in our result is an approach for constructing a composition of two given embeddings from subsets of $X$ into $\ell_2$ which inherits the distortions of each to within small multiplicative factors. Specifically, given a low $c_S$ distortion embedding from $S\subset X$ into $\ell_2$ and a high(er) $c_X$ distortion embedding from the entire set $X$ into $\ell_2$, we construct a single embedding that achieves the same distortion $c_S$ over pairs of points in $S$ and an expansion of at most $O(\log k)\cdot c_X$ over the remaining pairs of points, where $k=|X\setminus S|$. Our composition theorem extends to embeddings into arbitrary $\ell_p$ metrics for $p\ge 1$, and may be of independent interest. While unions of embeddings over disjoint sets have been studied previously, to our knowledge, this is the first work to consider compositions of nested embeddings.
翻译:我们研究带有离群点的欧几里得空间嵌入设计问题。给定度量空间$(X,d)$和整数$k$,目标是将$X$中除$k$个点(称为"离群点")之外的所有点嵌入到$\ell_2$中,并使得失真$c$尽可能小。对于给定离群点集大小$k$寻找最优失真$c$,或对于给定目标失真$c$寻找最小$k$,两者都是NP困难问题。事实上,即使去除离群点后的度量可等距嵌入到$\ell_2$中,在UGC假设下,对$k$的近似因子小于$2$仍然困难。我们考虑双准则近似。主要成果是一个多项式时间算法,该算法能以$O(\log^4 k)$因子近似离群点集大小,并以常数因子近似失真。我们结果的核心技术组件是一种方法,用于构建从$X$的子集到$\ell_2$的两个给定嵌入的复合,该复合在较小乘法因子范围内继承各自的失真特性。具体而言,给定从$S\subset X$到$\ell_2$的低失真嵌入$c_S$,以及从整个集合$X$到$\ell_2$的较高失真嵌入$c_X$,我们构建一个单一嵌入,该嵌入对$S$中的点对实现相同失真$c_S$,并对剩余点对实现至多$O(\log k)\cdot c_X$的膨胀,其中$k=|X\setminus S|$。我们的组合定理适用于$p\ge 1$的任意$\ell_p$度量嵌入,并可能具有独立研究价值。尽管先前已有关于不相交集合嵌入的并集研究,但据我们所知,这是首次考虑嵌套嵌入的组合问题。