The $\ell_p$ subspace approximation problem is an NP-hard low rank approximation problem that generalizes the median hyperplane problem ($p = 1$), principal component analysis ($p = 2$), and the center hyperplane problem ($p = \infty$). A popular approach to cope with the NP-hardness of this problem is to compute a strong coreset, which is a small weighted subset of the input points which simultaneously approximates the cost of every $k$-dimensional subspace, typically to $(1+\varepsilon)$ relative error for a small constant $\varepsilon$. We obtain the first algorithm for constructing a strong coreset for $\ell_p$ subspace approximation with a nearly optimal dependence on the rank parameter $k$, obtaining a nearly linear bound of $\tilde O(k)\mathrm{poly}(\varepsilon^{-1})$ for $p<2$ and $\tilde O(k^{p/2})\mathrm{poly}(\varepsilon^{-1})$ for $p>2$. Prior constructions either achieved a similar size bound but produced a coreset with a modification of the original points [SW18, FKW21], or produced a coreset of the original points but lost $\mathrm{poly}(k)$ factors in the coreset size [HV20, WY23]. Our techniques also lead to the first nearly optimal online strong coresets for $\ell_p$ subspace approximation with similar bounds as the offline setting, resolving a problem of [WY23]. All prior approaches lose $\mathrm{poly}(k)$ factors in this setting, even when allowed to modify the original points.
翻译:$\ell_p$子空间逼近问题是一个NP难的低秩逼近问题,它推广了中位数超平面问题($p = 1$)、主成分分析($p = 2$)以及中心超平面问题($p = \infty$)。应对该问题NP难性的一种常用方法是计算一个强核心集,即输入点的一个小型加权子集,它能同时逼近每个$k$维子空间的代价,通常对于较小的常数$\varepsilon$,可以达到$(1+\varepsilon)$的相对误差。我们首次获得了为$\ell_p$子空间逼近构造强核心集的算法,该算法在秩参数$k$上具有近乎最优的依赖性:对于$p<2$,得到了$\tilde O(k)\mathrm{poly}(\varepsilon^{-1})$的近似线性界;对于$p>2$,得到了$\tilde O(k^{p/2})\mathrm{poly}(\varepsilon^{-1})$的界。先前的构造要么达到了类似的大小界但产生的核心集是对原始点的修改版本[SW18, FKW21],要么产生了原始点的核心集但在核心集大小上损失了$\mathrm{poly}(k)$因子[HV20, WY23]。我们的技术还首次带来了$\ell_p$子空间逼近的近乎最优在线强核心集,其界与离线设置相似,从而解决了[WY23]提出的一个问题。在此设置下,所有先前方法即使被允许修改原始点,也会损失$\mathrm{poly}(k)$因子。