Tensor Clustering with Planted Structures: Statistical Optimality and Computational Limits

This paper studies the statistical and computational limits of high-order clustering with planted structures. We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and study the methods and theory for testing whether a cluster exists (detection) and identifying the support of cluster (recovery). Specifically, we identify the sharp boundaries of signal-to-noise ratio for which CHC and ROHC detection/recovery are statistically possible. We also develop the tight computational thresholds: when the signal-to-noise ratio is below these thresholds, we prove that polynomial-time algorithms cannot solve these problems under the computational hardness conjectures of hypergraphic planted clique (HPC) detection and hypergraphic planted dense subgraph (HPDS) recovery. We also propose polynomial-time tensor algorithms that achieve reliable detection and recovery when the signal-to-noise ratio is above these thresholds. Both sparsity and tensor structures yield the computational barriers in high-order tensor clustering. The interplay between them results in significant differences between high-order tensor clustering and matrix clustering in literature in aspects of statistical and computational phase transition diagrams, algorithmic approaches, hardness conjecture, and proof techniques. To our best knowledge, we are the first to give a thorough characterization of the statistical and computational trade-off for such a double computational-barrier problem. Finally, we provide evidence for the computational hardness conjectures of HPC detection (via low-degree polynomial and Metropolis methods) and HPDS recovery (via low-degree polynomial method).

翻译：本文研究了具有植入结构的高阶聚类的统计与计算极限。我们聚焦于两种聚类模型——恒定高阶聚类（CHC）与秩一高阶聚类（ROHC），并探讨了判断聚类是否存在（检测）以及识别聚类支撑集（恢复）的方法与理论。具体而言，我们确定了CHC与ROHC检测/恢复在统计上可行的信噪比尖锐边界。同时，我们建立了严格的计算阈值：当信噪比低于这些阈值时，我们证明在超图植入团（HPC）检测与超图植入稠密子图（HPDS）恢复的计算困难性猜想下，多项式时间算法无法解决这些问题。而当信噪比高于这些阈值时，我们提出了能够实现可靠检测与恢复的多项式时间张量算法。稀疏性与张量结构共同构成了高阶张量聚类的计算障碍，二者的相互作用导致高阶张量聚类与文献中矩阵聚类在统计与计算相变图、算法方法、困难性猜想及证明技术等方面存在显著差异。据我们所知，我们是首个对此类双重计算障碍问题的统计与计算权衡进行完整刻画的研究。最后，我们通过低次多项式方法与Metropolis方法验证了HPC检测的困难性猜想，并通过低次多项式方法验证了HPDS恢复的困难性猜想。