We study the complexity of Non-Gaussian Component Analysis (NGCA) in the Statistical Query (SQ) model. Prior work developed a general methodology to prove SQ lower bounds for this task that have been applicable to a wide range of contexts. In particular, it was known that for any univariate distribution $A$ satisfying certain conditions, distinguishing between a standard multivariate Gaussian and a distribution that behaves like $A$ in a random hidden direction and like a standard Gaussian in the orthogonal complement, is SQ-hard. The required conditions were that (1) $A$ matches many low-order moments with the standard univariate Gaussian, and (2) the chi-squared norm of $A$ with respect to the standard Gaussian is finite. While the moment-matching condition is necessary for hardness, the chi-squared condition was only required for technical reasons. In this work, we establish that the latter condition is indeed not necessary. In particular, we prove near-optimal SQ lower bounds for NGCA under the moment-matching condition only. Our result naturally generalizes to the setting of a hidden subspace. Leveraging our general SQ lower bound, we obtain near-optimal SQ lower bounds for a range of concrete estimation tasks where existing techniques provide sub-optimal or even vacuous guarantees.
翻译:我们研究统计查询(SQ)模型中非高斯成分分析(NGCA)的复杂度。先前的工作发展了一套通用方法论,用于证明该任务在广泛场景下的SQ下界。特别地,已知对于满足特定条件的任意单变量分布$A$,区分标准多元高斯分布与在随机隐藏方向上表现为$A$、在正交补空间中表现为标准高斯分布的分布是SQ困难的。所需条件包括:(1)$A$与标准单变量高斯分布匹配多个低阶矩;(2)$A$相对于标准高斯分布的卡方范数有限。虽然矩匹配条件对困难性而言是必要的,但卡方条件仅出于技术原因而要求。在本工作中,我们证明后者条件确实非必要。特别地,我们仅在矩匹配条件下证明了NGCA的近乎最优SQ下界。我们的结果自然推广到隐藏子空间的情形。通过利用我们的一般性SQ下界,我们在一系列具体估计任务中获得了近乎最优的SQ下界,而现有技术在这些任务中仅能提供次优甚至无效的保证。