Linear representation learning is widely studied due to its conceptual simplicity and empirical utility in tasks such as compression, classification, and feature extraction. Given a set of points $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}^{d \times n}$ and a vector $\mathbf{y} \in \mathbb{R}^d$, the goal is to find coefficients $\mathbf{w} \in \mathbb{R}^n$ so that $\mathbf{X} \mathbf{w} \approx \mathbf{y}$, subject to some desired structure on $\mathbf{w}$. In this work we seek $\mathbf{w}$ that forms a local reconstruction of $\mathbf{y}$ by solving a regularized least squares regression problem. We obtain local solutions through a locality function that promotes the use of columns of $\mathbf{X}$ that are close to $\mathbf{y}$ when used as a regularization term. We prove that, for all levels of regularization and under a mild condition that the columns of $\mathbf{X}$ have a unique Delaunay triangulation, the optimal coefficients' number of non-zero entries is upper bounded by $d+1$, thereby providing local sparse solutions when $d \ll n$. Under the same condition we also show that for any $\mathbf{y}$ contained in the convex hull of $\mathbf{X}$ there exists a regime of regularization parameter such that the optimal coefficients are supported on the vertices of the Delaunay simplex containing $\mathbf{y}$. This provides an interpretation of the sparsity as having structure obtained implicitly from the Delaunay triangulation of $\mathbf{X}$. We demonstrate that our locality regularized problem can be solved in comparable time to other methods that identify the containing Delaunay simplex.
翻译:线性表示学习因其概念简洁性以及在压缩、分类和特征提取等任务中的实证效用而被广泛研究。给定一组点 $[\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_n] = \mathbf{X} \in \mathbb{R}^{d \times n}$ 和一个向量 $\mathbf{y} \in \mathbb{R}^d$,目标是找到系数 $\mathbf{w} \in \mathbb{R}^n$ 使得 $\mathbf{X} \mathbf{w} \approx \mathbf{y}$,并对 $\mathbf{w}$ 施加某种期望结构。在本文中,我们通过求解正则化最小二乘回归问题,寻求形成 $\mathbf{y}$ 局部重建的 $\mathbf{w}$。我们通过一个局部性函数来获得局部解,该函数作为正则化项时,会促进使用与 $\mathbf{y}$ 相近的 $\mathbf{X}$ 的列。我们证明,对于所有正则化水平,且在 $\mathbf{X}$ 的列具有唯一德劳内三角剖分的温和条件下,最优系数的非零元素数量上界为 $d+1$,因此在 $d \ll n$ 时提供了局部稀疏解。在相同条件下,我们还表明,对于任意包含在 $\mathbf{X}$ 凸包中的 $\mathbf{y}$,存在一个正则化参数区间,使得最优系数支撑于包含 $\mathbf{y}$ 的德劳内单纯形的顶点上。这为稀疏性提供了一种解释,即其结构是从 $\mathbf{X}$ 的德劳内三角剖分中隐式获得的。我们证明,求解该局部性正则化问题所需的时间与识别包含德劳内单纯形的其他方法相当。