Implementation of many statistical methods for large, multivariate data sets requires one to solve a linear system that, depending on the method, is of the dimension of the number of observations or each individual data vector. This is often the limiting factor in scaling the method with data size and complexity. In this paper we illustrate the use of Krylov subspace methods to address this issue in a statistical solution to a source separation problem in cosmology where the data size is prohibitively large for direct solution of the required system. Two distinct approaches, adapted from techniques in the literature, are described: one that uses the method of conjugate gradients directly to the Kronecker-structured problem and another that reformulates the system as a Sylvester matrix equation. We show that both approaches produce an accurate solution within an acceptable computation time and with practical memory requirements for the data size that is currently available.
翻译:许多针对大型多元数据集的统计方法在实施过程中,都需要求解一个线性系统,该系统的维度根据具体方法的不同,可能对应于观测值的数量或单个数据向量的维度。这通常是限制方法随数据规模和复杂性扩展的关键因素。本文阐述了在宇宙学源分离问题的统计求解中,如何利用Krylov子空间方法来解决这一问题,其中数据规模庞大,直接求解所需系统在计算上不可行。我们描述了两种从文献技术中改编的独特方法:一种是直接将共轭梯度法应用于Kronecker结构问题,另一种是将系统重新表述为Sylvester矩阵方程。我们证明,对于当前可用的数据规模,这两种方法都能在可接受的计算时间内,以实际的内存需求,产生精确的解。