A PTAS for $\ell_0$-Low Rank Approximation: Solving Dense CSPs over Reals

We consider the Low Rank Approximation problem, where the input consists of a matrix $A \in \mathbb{R}^{n_R \times n_C}$ and an integer $k$, and the goal is to find a matrix $B$ of rank at most $k$ that minimizes $\| A - B \|_0$, which is the number of entries where $A$ and $B$ differ. For any constant $k$ and $\varepsilon > 0$, we present a polynomial time $(1 + \varepsilon)$-approximation time for this problem, which significantly improves the previous best $poly(k)$-approximation. Our algorithm is obtained by viewing the problem as a Constraint Satisfaction Problem (CSP) where each row and column becomes a variable that can have a value from $\mathbb{R}^k$. In this view, we have a constraint between each row and column, which results in a {\em dense} CSP, a well-studied topic in approximation algorithms. While most of previous algorithms focus on finite-size (or constant-size) domains and involve an exhaustive enumeration over the entire domain, we present a new framework that bypasses such an enumeration in $\mathbb{R}^k$. We also use tools from the rich literature of Low Rank Approximation in different objectives (e.g., $\ell_p$ with $p \in (0, \infty)$) or domains (e.g., finite fields/generalized Boolean). We believe that our techniques might be useful to study other real-valued CSPs and matrix optimization problems. On the hardness side, when $k$ is part of the input, we prove that Low Rank Approximation is NP-hard to approximate within a factor of $\Omega(\log n)$. This is the first superconstant NP-hardness of approximation for any $p \in [0, \infty]$ that does not rely on stronger conjectures (e.g., the Small Set Expansion Hypothesis).

翻译：我们考虑低秩近似问题，其输入为矩阵 $A \in \mathbb{R}^{n_R \times n_C}$ 及整数 $k$，目标在于寻找秩不超过 $k$ 的矩阵 $B$ 以最小化 $\| A - B \|_0$（即 $A$ 与 $B$ 中不同元素的数量）。针对任意常数 $k$ 和 $\varepsilon > 0$，我们给出该问题的多项式时间 $(1+\varepsilon)$-近似算法，显著改进了此前最优的 $poly(k)$-近似结果。我们的算法通过将问题建模为约束满足问题（CSP）实现：每个行和列对应一个变量，其取值空间为 $\mathbb{R}^k$。在此视角下，每对行列间存在一个约束，从而构成一个“稠密”CSP——这是近似算法领域中的经典研究对象。尽管此前大多数算法聚焦于有限（或常数）规模的赋值域需对全域穷举，我们提出一种新框架，避免了对 $\mathbb{R}^k$ 的此类穷举。此外，我们借鉴了不同目标函数（如 $p \in (0,\infty)$ 的 $\ell_p$ 范式）或不同域（如有限域/广义布尔域）下的低秩近似文献中的工具。我们相信，这项技术可能对研究其他实数值CSP及矩阵优化问题具有推广价值。在复杂性下界方面，当 $k$ 作为输入参数时，我们证明低秩近似的NP困难性：无法在因子 $\Omega(\log n)$ 内近似。这是首个不依赖更强猜想（如小集扩张假设）的、针对任意 $p \in [0,\infty]$ 的超常数NP困难性近似界。