Recently (Elkin, Filtser, Neiman 2017) introduced the concept of a {\it terminal embedding} from one metric space $(X,d_X)$ to another $(Y,d_Y)$ with a set of designated terminals $T\subset X$. Such an embedding $f$ is said to have distortion $\rho\ge 1$ if $\rho$ is the smallest value such that there exists a constant $C>0$ satisfying \begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) . \end{equation*} When $X,Y$ are both Euclidean metrics with $Y$ being $m$-dimensional, recently (Narayanan, Nelson 2019), following work of (Mahabadi, Makarychev, Makarychev, Razenshteyn 2018), showed that distortion $1+\epsilon$ is achievable via such a terminal embedding with $m = O(\epsilon^{-2}\log n)$ for $n := |T|$. This generalizes the Johnson-Lindenstrauss lemma, which only preserves distances within $T$ and not to $T$ from the rest of space. The downside of prior work is that evaluating their embedding on some $q\in \mathbb{R}^d$ required solving a semidefinite program with $\Theta(n)$ constraints in~$m$ variables and thus required some superlinear $\mathrm{poly}(n)$ runtime. Our main contribution in this work is to give a new data structure for computing terminal embeddings. We show how to pre-process $T$ to obtain an almost linear-space data structure that supports computing the terminal embedding image of any $q\in\mathbb{R}^d$ in sublinear time $O^* (n^{1-\Theta(\epsilon^2)} + d)$. To accomplish this, we leverage tools developed in the context of approximate nearest neighbor search.
翻译:最近(Elkin, Filtser, Neiman 2017)引入了从度量空间$(X,d_X)$到另一个度量空间$(Y,d_Y)$的{\it 终端嵌入}概念,其中包含一组指定终端$T\subset X$。若存在常数$C>0$满足\begin{equation*} \forall x\in T\ \forall q\in X,\ C d_X(x, q) \le d_Y(f(x), f(q)) \le C \rho d_X(x, q) \end{equation*},则称这样的嵌入$f$具有失真$\rho\ge 1$,其中$\rho$是满足该条件的最小值。当$X,Y$均为欧几里得度量空间且$Y$为$m$维时,近期(Narayanan, Nelson 2019)在(Mahabadi, Makarychev, Makarychev, Razenshteyn 2018)工作基础上表明,通过终端嵌入可达到失真$1+\epsilon$,其中维度$m = O(\epsilon^{-2}\log n)$且$n := |T|$。这推广了仅保持$T$内部距离、而非$T$到空间其他部分距离的Johnson-Lindenstrauss引理。先前工作的不足在于,对任意$q\in \mathbb{R}^d$计算其嵌入需要求解一个包含$\Theta(n)$个约束条件(涉及$m$个变量)的半定规划,因此需要超线性的$\mathrm{poly}(n)$运行时间。本文的主要贡献是提出一种新的终端嵌入计算数据结构。我们展示了如何对$T$进行预处理,构建近乎线性空间的数据结构,支持对任意$q\in\mathbb{R}^d$在子线性时间$O^* (n^{1-\Theta(\epsilon^2)} + d)$内计算终端嵌入像。为实现这一目标,我们利用了近似最近邻搜索领域的相关工具。