Higher-order tensor datasets arise commonly in recommendation systems, neuroimaging, and social networks. Here we develop probable methods for estimating a possibly high rank signal tensor from noisy observations. We consider a generative latent variable tensor model that incorporates both high rank and low rank models, including but not limited to, simple hypergraphon models, single index models, low-rank CP models, and low-rank Tucker models. Comprehensive results are developed on both the statistical and computational limits for the signal tensor estimation. We find that high-dimensional latent variable tensors are of log-rank; the fact explains the pervasiveness of low-rank tensors in applications. Furthermore, we propose a polynomial-time spectral algorithm that achieves the computationally optimal rate. We show that the statistical-computational gap emerges only for latent variable tensors of order 3 or higher. Numerical experiments and two real data applications are presented to demonstrate the practical merits of our methods.
翻译:高阶张量数据集常见于推荐系统、神经影像学及社交网络中。本文针对含噪声观测数据,提出了一种用于估计可能具有高秩信号张量的可行方法。我们构建了一个生成式潜变量张量模型,该模型同时涵盖高秩与低秩模型,包括但不限于简单超图模型、单指标模型、低秩CP模型及低秩Tucker模型。针对信号张量估计,我们从统计极限与计算极限两个维度建立了系统性结论。研究发现高维潜变量张量具有对数秩特性,这一事实解释了低秩张量在实际应用中的普遍性。此外,我们提出了一种达到计算最优速率的多项式时间谱算法。研究表明,统计-计算间隙仅出现在三阶及以上的潜变量张量中。数值实验与两项真实数据应用展示了所提方法的实际优势。