This paper investigates the spectral norm version of the column subset selection problem. Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a positive integer $k\leq\text{rank}(\mathbf{A})$, the objective is to select exactly $k$ columns of $\mathbf{A}$ that minimize the spectral norm of the residual matrix after projecting $\mathbf{A}$ onto the space spanned by the selected columns. We use the method of interlacing polynomials introduced by Marcus-Spielman-Srivastava to derive an asymptotically sharp upper bound on the minimal approximation error, and propose a deterministic polynomial-time algorithm that achieves this error bound (up to a computational error). Furthermore, we extend our result to a column partition problem in which the columns of $\mathbf{A}$ can be partitioned into $r\geq 2$ subsets such that $\mathbf{A}$ can be well approximated by subsets from various groups. We show that the machinery of interlacing polynomials also works in this context, and establish a connection between the relevant expected characteristic polynomials and the $r$-characteristic polynomials introduced by Ravichandran and Leake. As a consequence, we prove that the columns of a rank-$d$ matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ can be partitioned into $r$ subsets $S_1,\ldots S_r$, such that the column space of $\mathbf{A}$ can be well approximated by the span of the columns in the complement of $S_i$ for each $1\leq i\leq r$.
翻译:本文研究列子集选择问题在谱范数下的版本。给定矩阵 $\mathbf{A}\in\mathbb{R}^{n\times d}$ 和正整数 $k\leq\text{rank}(\mathbf{A})$,目标是精确选取 $\mathbf{A}$ 的 $k$ 列,使得在将 $\mathbf{A}$ 投影到所选列张成的空间后,残差矩阵的谱范数最小化。我们采用 Marcus-Spielman-Srivastava 引入的交错多项式方法,推导出最小逼近误差的渐近紧致上界,并提出一种确定性多项式时间算法(在计算误差范围内)实现该误差上界。此外,我们将结果推广到列划分问题:可将 $\mathbf{A}$ 的列划分为 $r\geq 2$ 个子集,使得 $\mathbf{A}$ 能被来自不同分组的子集良好逼近。我们证明了交错多项式方法在此背景下同样适用,并建立了相关期望特征多项式与 Ravichandran 和 Leake 引入的 $r$-特征多项式之间的联系。作为推论,我们证明:对于秩为 $d$ 的矩阵 $\mathbf{A}\in\mathbb{R}^{n\times d}$,可将其列划分为 $r$ 个子集 $S_1,\ldots,S_r$,使得对于每个 $1\leq i\leq r$,$\mathbf{A}$ 的列空间能被 $S_i$ 补集所张成的列空间良好逼近。