We introduce efficient $(1+\varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix $\mathbf{A}\in\{0,1\}^{n\times d}$, a rank parameter $k>0$, as well as an accuracy parameter $\varepsilon>0$, and the goal is to approximate $\mathbf{A}$ as a product of low-rank factors $\mathbf{U}\in\{0,1\}^{n\times k}$ and $\mathbf{V}\in\{0,1\}^{k\times d}$. Equivalently, we want to find $\mathbf{U}$ and $\mathbf{V}$ that minimize the Frobenius loss $\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2$. Before this work, the state-of-the-art for this problem was the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a $C$-approximation for some constant $C\ge 576$. We give the first $(1+\varepsilon)$-approximation algorithm using running time singly exponential in $k$, where $k$ is typically a small integer. Our techniques generalize to other common variants of the BMF problem, admitting bicriteria $(1+\varepsilon)$-approximation algorithms for $L_p$ loss functions and the setting where matrix operations are performed in $\mathbb{F}_2$. Our approach can be implemented in standard big data models, such as the streaming or distributed models.
翻译:我们针对二进制矩阵分解(BMF)问题提出了高效的 $(1+\varepsilon)$-近似算法。该问题输入为矩阵 $\mathbf{A}\in\{0,1\}^{n\times d}$、秩参数 $k>0$ 及精度参数 $\varepsilon>0$,目标是将 $\mathbf{A}$ 近似表示为低秩因子 $\mathbf{U}\in\{0,1\}^{n\times k}$ 和 $\mathbf{V}\in\{0,1\}^{k\times d}$ 的乘积。等价地,我们需要寻找最小化弗罗贝尼乌斯损失 $\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2$ 的 $\mathbf{U}$ 和 $\mathbf{V}$。在此工作之前,该问题的最优近似算法是由 Kumar 等人 [ICML 2019] 提出的 $C$-近似算法(其中常数 $C\ge 576$)。我们首次给出了 $(1+\varepsilon)$-近似算法,其运行时间关于 $k$(通常为小整数)呈单指数形式。我们的技术可推广至 BMF 问题的其他常见变体,为 $L_p$ 损失函数以及在 $\mathbb{F}_2$ 中执行矩阵运算的场景提供了双准则 $(1+\varepsilon)$-近似算法。该方法可在标准大数据模型(如流模型或分布式模型)中实现。