Multiway data analysis is aimed at inferring patterns from data represented as a multi-dimensional array. Estimating covariance from multiway data is a fundamental statistical task, however, the intrinsic high dimensionality poses significant statistical and computational challenges. Recently, several factorized covariance models, paired with estimation algorithms, have been proposed to circumvent these obstacles. Despite several promising results on the algorithmic front, it remains under-explored whether and when such a model is valid. To address this question, we define the notion of Kronecker-separable multiway covariance, which can be written as a sum of $r$ tensor products of mode-wise covariances. The question of whether a given covariance can be represented as a separable multiway covariance is then reduced to an equivalent question about separability of quantum states. Using this equivalence, it follows directly that a generic multiway covariance tends to be non-separable (even if $r \to \infty$), and moreover, finding its best separable approximation is NP-hard. These observations imply that factorized covariance models are restrictive and should be used only when there is a compelling rationale for such a model.
翻译:多路数据分析旨在从表示为多维数组的数据中推断模式。从多路数据中估计协方差是一项基础统计任务,然而其固有的高维度带来了显著的统计和计算挑战。近期,学者们提出了若干因子化协方差模型及其估计算法以规避这些障碍。尽管算法层面取得了多项有前景的成果,但此类模型是否以及何时有效仍鲜有探索。为回答这一问题,我们定义了克罗内克可分离多路协方差的概念,该协方差可表示为 $r$ 个模式协方差张量积之和。于是,给定协方差能否表示为可分离多路协方差的问题,被等价转化为量子态的可分离性问题。利用这一等价性,可直接推知:一般多路协方差倾向于不可分离(即使 $r \to \infty$),此外,寻找其最佳可分离逼近是NP困难的。这些观察表明,因子化协方差模型具有局限性,仅当存在充分理由支持此类模型时才应使用。