Implementation of many statistical methods for large, multivariate data sets requires one to solve a linear system that, depending on the method, is of the dimension of the number of observations or each individual data vector. This is often the limiting factor in scaling the method with data size and complexity. In this paper we illustrate the use of Krylov subspace methods to address this issue in a statistical solution to a source separation problem in cosmology where the data size is prohibitively large for direct solution of the required system. Two distinct approaches, adapted from techniques in the literature, are described: one that uses the method of conjugate gradients directly to the Kronecker-structured problem and another that reformulates the system as a Sylvester matrix equation. We show that both approaches produce an accurate solution within an acceptable computation time and with practical memory requirements for the data size that is currently available.
翻译:许多针对大规模多元数据集的统计方法在实现时,需要求解一个线性系统——根据方法的不同,其维度可能是观测数或单个数据向量的长度。这通常是制约方法随数据规模和复杂度扩展的关键因素。本文以宇宙学中源分离问题的统计解法为例,展示了如何利用Krylov子空间方法应对数据规模过大而无法直接求解所需系统的挑战。基于现有文献中的技术,本文描述了两种不同方法:一种是将共轭梯度法直接应用于Kronecker结构问题,另一种则将系统重构为Sylvester矩阵方程。我们证明,对于当前可用数据规模,这两种方法均能在可接受的计算时间内获得精确解,同时内存需求也符合实际应用场景。