The celebrated Johnson-Lindenstrauss lemma states that for all $\varepsilon \in (0,1)$ and finite sets $X \subseteq \mathbb{R}^N$ with $n>1$ elements, there exists a matrix $\Phi \in \mathbb{R}^{m \times N}$ with $m=\mathcal{O}(\varepsilon^{-2}\log n)$ such that \[ (1 - \varepsilon) \|x-y\|_2 \leq \|\Phi x-\Phi y\|_2 \leq (1+\varepsilon)\| x- y\|_2 \quad \forall\, x, y \in X.\] Herein we consider terminal embedding results which have recently been introduced in the computer science literature as stronger extensions of the Johnson-Lindenstrauss lemma for finite sets. After a short survey of this relatively recent line of work, we extend the theory of terminal embeddings to hold for arbitrary (e.g., infinite) subsets $X \subseteq \mathbb{R}^N$, and then specialize our generalized results to the case where $X$ is a low-dimensional compact submanifold of $\mathbb{R}^N$. In particular, we prove the following generalization of the Johnson-Lindenstrauss lemma: For all $\varepsilon \in (0,1)$ and $X\subseteq\mathbb{R}^N$, there exists a terminal embedding $f: \mathbb{R}^N \longrightarrow \mathbb{R}^{m}$ such that $$(1 - \varepsilon) \| x - y \|_2 \leq \left\| f(x) - f(y) \right\|_2 \leq (1 + \varepsilon) \| x - y \|_2 \quad \forall \, x \in X ~{\rm and}~ \forall \, y \in \mathbb{R}^N.$$ Crucially, we show that the dimension $m$ of the range of $f$ above is optimal up to multiplicative constants, satisfying $m=\mathcal{O}(\varepsilon^{-2} \omega^2(S_X))$, where $\omega(S_X)$ is the Gaussian width of the set of unit secants of $X$, $S_X=\overline{\{(x-y)/\|x-y\|_2 \colon x \neq y \in X\}}$. Furthermore, our proofs are constructive and yield algorithms for computing a general class of terminal embeddings $f$, an instance of which is demonstrated herein to allow for more accurate compressive nearest neighbor classification than standard linear Johnson-Lindenstrauss embeddings do in practice.
翻译:著名的Johnson-Lindenstrauss引理指出:对所有$\varepsilon \in (0,1)$和包含$n>1$个元素的有限集$X \subseteq \mathbb{R}^N$,存在矩阵$\Phi \in \mathbb{R}^{m \times N}$(其中$m=\mathcal{O}(\varepsilon^{-2}\log n)$)使得\[ (1 - \varepsilon) \|x-y\|_2 \leq \|\Phi x-\Phi y\|_2 \leq (1+\varepsilon)\| x- y\|_2 \quad \forall\, x, y \in X.\]本文研究计算机科学文献近期提出的、作为有限集Johnson-Lindenstrauss引理更强推广形式的终末嵌入(terminal embedding)结果。在简要综述这一较新研究方向后,我们将终末嵌入理论扩展至任意(如无限)子集$X \subseteq \mathbb{R}^N$,并进一步将推广结果特化至$X$为$\mathbb{R}^N$中低维紧致子流形的情形。特别地,我们证明了Johnson-Lindenstrauss引理的以下推广形式:对所有$\varepsilon \in (0,1)$和$X\subseteq\mathbb{R}^N$,存在终末嵌入$f: \mathbb{R}^N \longrightarrow \mathbb{R}^{m}$使得$$(1 - \varepsilon) \| x - y \|_2 \leq \left\| f(x) - f(y) \right\|_2 \leq (1 + \varepsilon) \| x - y \|_2 \quad \forall \, x \in X ~{\rm and}~ \forall \, y \in \mathbb{R}^N.$$关键性地,我们证明了上述$f$的值域维数$m$在乘法常数意义下达到最优,满足$m=\mathcal{O}(\varepsilon^{-2} \omega^2(S_X))$,其中$\omega(S_X)$是$X$的单位割线集$S_X=\overline{\{(x-y)/\|x-y\|_2 \colon x \neq y \in X\}}$的高斯宽度。此外,我们的证明是构造性的,可给出计算一大类终末嵌入$f$的算法。本文展示其中一实例:与标准线性Johnson-Lindenstrauss嵌入相比,该实例在实际应用中的压缩最近邻分类任务能实现更高精度。