We study the design of embeddings into Euclidean space with outliers. Given a metric space $(X,d)$ and an integer $k$, the goal is to embed all but $k$ points in $X$ (called the ``outliers") into $\ell_2$ with the smallest possible distortion $c$. Finding the optimal distortion $c$ for a given outlier set size $k$, or alternately the smallest $k$ for a given target distortion $c$ are both NP-hard problems. In fact, it is UGC-hard to approximate $k$ to within a factor smaller than $2$ even when the metric sans outliers is isometrically embeddable into $\ell_2$. We consider bi-criteria approximations. Our main result is a polynomial time algorithm that approximates the outlier set size to within an $O(\log^2 k)$ factor and the distortion to within a constant factor. The main technical component in our result is an approach for constructing Lipschitz extensions of embeddings into Banach spaces (such as $\ell_p$ spaces). We consider a stronger version of Lipschitz extension that we call a \textit{nested composition of embeddings}: given a low distortion embedding of a subset $S$ of the metric space $X$, our goal is to extend this embedding to all of $X$ such that the distortion over $S$ is preserved, whereas the distortion over the remaining pairs of points in $X$ is bounded by a function of the size of $X\setminus S$. Prior work on Lipschitz extension considers settings where the size of $X$ is potentially much larger than that of $S$ and the expansion bounds depend on $|S|$. In our setting, the set $S$ is nearly all of $X$ and the remaining set $X\setminus S$, a.k.a. the outliers, is small. We achieve an expansion bound that is logarithmic in $|X\setminus S|$.
翻译:我们研究带离群点的欧几里得空间嵌入设计问题。给定度量空间$(X,d)$和整数$k$,目标是将$X$中除$k$个点(称为"离群点")外的所有点以最小可能失真度$c$嵌入$\ell_2$空间。对于给定离群点集大小$k$寻找最优失真度$c$,或对于给定目标失真度$c$寻找最小$k$,均为NP-hard问题。事实上,即使去除离群点后的度量空间可等距嵌入$\ell_2$,对$k$进行因子小于$2$的近似逼近仍是UGC-hard的。我们考虑双准则近似方法。主要成果是一个多项式时间算法,该算法对离群点集大小的近似保证为$O(\log^2 k)$因子,对失真度的近似保证为常数因子。本研究的技术核心是构建巴拿赫空间(如$\ell_p$空间)嵌入的Lipschitz延拓方法。我们考虑一种称为**嵌入的嵌套组合**的强化Lipschitz延拓:给定度量空间$X$子集$S$的低失真嵌入,目标是将该嵌入延拓至整个$X$,使得在$S$上的失真度保持不变,而$X$中其余点对的失真度受限于$X\setminus S$大小的函数。已有Lipschitz延拓工作考虑$X$规模远大于$S$且扩张界依赖于$|S|$的设定。在我们的设定中,集合$S$几乎覆盖整个$X$,而剩余集合$X\setminus S$(即离群点)规模较小。我们实现了对$|X\setminus S|$取对数的扩张界。