We study the design of embeddings into Euclidean space with outliers. Given a metric space $(X,d)$ and an integer $k$, the goal is to embed all but $k$ points in $X$ (called the "outliers") into $\ell_2$ with the smallest possible distortion $c$. Finding the optimal distortion $c$ for a given outlier set size $k$, or alternately the smallest $k$ for a given target distortion $c$ are both NP-hard problems. In fact, it is UGC-hard to approximate $k$ to within a factor smaller than $2$ even when the metric sans outliers is isometrically embeddable into $\ell_2$. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an $O(\log^4 k)$ factor and the distortion to within a constant factor. The main technical component in our result is an approach for constructing a composition of two given embeddings from subsets of $X$ into $\ell_2$ which inherits the distortions of each to within small multiplicative factors. Specifically, given a low $c_S$ distortion embedding from $S\subset X$ into $\ell_2$ and a high(er) $c_X$ distortion embedding from the entire set $X$ into $\ell_2$, we construct a single embedding that achieves the same distortion $c_S$ over pairs of points in $S$ and an expansion of at most $O(\log k)\cdot c_X$ over the remaining pairs of points, where $k=|X\setminus S|$. Our composition theorem extends to embeddings into arbitrary $\ell_p$ metrics for $p\ge 1$, and may be of independent interest. While unions of embeddings over disjoint sets have been studied previously, to our knowledge, this is the first work to consider compositions of nested embeddings.
翻译:本文研究面向异常点的欧几里得空间嵌入设计问题。给定度量空间$(X,d)$与整数$k$,目标是将$X$中除$k$个点(称为“异常点”)外的所有点嵌入$\ell_2$空间,并实现最小可能失真度$c$。针对给定异常点集规模$k$寻找最优失真度$c$,或针对给定目标失真度$c$寻找最小$k$值均为NP难问题。事实上,即使去除异常点后的度量空间可等距嵌入$\ell_2$空间,将$k$的近似比控制在2以内仍是UGC难题。本文考虑双准则近似方法,主要成果是一种多项式时间算法,该算法能将异常点集规模的近似比控制在$O(\log^4 k)$因子内,同时将失真度的近似比控制在常数因子内。本成果的核心技术组件是一种组合方法,该方法通过组合$X$的子集到$\ell_2$空间的两个给定嵌入,使得每个嵌入的失真度仅产生小倍数因子的继承。具体而言,给定子集$S\subset X$到$\ell_2$空间具有低失真度$c_S$的嵌入,以及全集$X$到$\ell_2$空间具有较高失真度$c_X$的嵌入,我们构造一个新嵌入:对于$S$中的点对保持与原嵌入相同的失真度$c_S$,对于剩余点对则实现至多$O(\log k)\cdot c_X$的扩张度,其中$k=|X\setminus S|$。该组合定理可推广至$p\ge 1$的任意$\ell_p$度量嵌入问题,并可能具有独立研究价值。尽管先前已有关于不相交集合嵌入并集的研究,但据我们所知,本文是首个探讨嵌套嵌入组合的工作。