Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
翻译:张量PCA是由Montanari和Richard提出的一种典型统计推断问题,旨在研究从高阶矩张量中估计未知参数的计算难度。与矩阵PCA不同,张量PCA存在统计-计算差距,即存在一个样本量区间,在该区间内问题在信息论上可解,但被推测在计算上困难。本文利用通信复杂度推导了张量PCA中受内存限制算法的运行时间复杂度下界。这些下界揭示了任何成功求解张量PCA的算法在数据样本遍历次数、样本量与所需内存之间的权衡关系。虽然下界并未排除多项式时间算法的存在,但表明当样本量不足时,梯度下降法和幂法等多种常用算法必须增加迭代次数。本文进一步为"非高斯成分分析"(一类低阶矩张量不携带未知参数信息的统计估计问题)建立了类似下界。最终,针对张量PCA的非对称变体及相关统计估计问题,获得了更强的下界。这些结果解释了为何针对此类问题的众多估计器采用的内存状态规模远超参数有效维度。