Multiway data analysis aims to uncover patterns in data structured as multi-indexed arrays, and the covariance of such data plays a crucial role in various machine learning applications. However, the intrinsically high dimension of multiway covariance presents significant challenges. To address these challenges, factorized covariance models have been proposed that rely on a separability assumption: the multiway covariance can be accurately expressed as a sum of Kronecker products of mode-wise covariances. This paper is concerned with the accuracy of such separable models for representing multiway covariances. We reduce the question of whether a given covariance can be represented as a separable multiway covariance to an equivalent question about separability of quantum states. Based on this equivalence, we establish that generic multiway covariances tend to be not separable. Moreover, we show that determining the best separable approximation of a generic covariance is NP-hard. Our results suggest that factorized covariance models might not accurately approximate covariance, without additional assumptions ensuring separability. To balance these negative results, we propose an iterative Frank-Wolfe algorithm for computing Kronecker-separable covariance approximations with some additional side information. We establish an oracle complexity bound and empirically observe its consistent convergence to a separable limit point, often close to the ``best'' separable approximation. These results suggest that practical methods may be able to find a Kronecker-separable approximation of covariances, despite the worst-case NP hardness results.
翻译:多路数据分析旨在揭示结构化多索引数组中的数据模式,而此类数据的协方差在各种机器学习应用中扮演着关键角色。然而,多路协方差固有的高维特性带来了显著挑战。为解决这些挑战,研究者提出了基于可分离性假设的因子化协方差模型:即多路协方差可以精确表示为各模式协方差的Kronecker积之和。本文关注此类可分离模型在表示多路协方差时的准确性。我们将给定协方差能否表示为可分离多路协方差的问题,转化为关于量子态可分离性的等价问题。基于此等价性,我们证明一般多路协方差往往不可分离。此外,我们表明确定一般协方差的最佳可分离近似是NP难问题。我们的结果表明,在没有额外可分离性假设的情况下,因子化协方差模型可能无法准确近似协方差。为平衡这些负面结论,我们提出一种迭代Frank-Wolfe算法,通过利用部分辅助信息计算Kronecker可分离协方差近似。我们建立了该算法的Oracle复杂度界限,并通过实验观察到其一致收敛于可分离极限点,且该极限点通常接近“最优”可分离近似。这些结果表明,尽管在最坏情况下存在NP难问题,但实用方法仍可能找到协方差的Kronecker可分离近似。