Tensor completion is a core machine learning algorithm used in recommender systems and other domains with missing data. While the matrix case is well-understood, theoretical results for tensor problems are limited, particularly when the sampling patterns are deterministic. Here we bound the generalization error of the solutions of two tensor completion methods, Poisson loss and atomic norm minimization, providing tighter bounds in terms of the target tensor rank. If the ground-truth tensor is order $t$ with CP-rank $r$, the dependence on $r$ is improved from $r^{2(t-1)(t^2-t-1)}$ in arXiv:1910.10692 to $r^{2(t-1)(3t-5)}$. The error in our bounds is deterministically controlled by the spectral gap of the sampling sparsity pattern. We also prove several new properties for the atomic tensor norm, reducing the rank dependence from $r^{3t-3}$ in arXiv:1711.04965 to $r^{3t-5}$ under random sampling schemes. A limitation is that atomic norm minimization, while theoretically interesting, leads to inefficient algorithms. However, numerical experiments illustrate the dependence of the reconstruction error on the spectral gap for the practical max-quasinorm, ridge penalty, and Poisson loss minimization algorithms. This view through the spectral gap is a promising window for further study of tensor algorithms.
翻译:张量补全是推荐系统及其他缺失数据场景中的核心机器学习算法。尽管矩阵情形已有充分研究,但张量问题的理论结果仍较为有限,尤其是在采样模式为确定性的情况下。本文界定了两种张量补全方法(泊松损失与原子范数最小化)的解的泛化误差,给出了关于目标张量秩的更紧边界。当地面真实张量为$t$阶且CP秩为$r$时,对$r$的依赖关系从arXiv:1910.10692中的$r^{2(t-1)(t^2-t-1)}$改进为$r^{2(t-1)(3t-5)}$。我们边界中的误差由采样稀疏模式的谱间隙确定性控制。我们还证明了原子张量范数的若干新性质,在随机采样方案下将对秩的依赖从arXiv:1711.04965中的$r^{3t-3}$降低至$r^{3t-5}$。一个局限性在于,原子范数最小化虽具理论价值,但会导致算法低效。然而,数值实验展示了实际最大拟范数、岭惩罚与泊松损失最小化算法中重建误差对谱间隙的依赖关系。这种基于谱间隙的视角为后续张量算法研究提供了有前景的窗口。