The big data era of science and technology motivates statistical modeling of matrix-valued data using a low-rank representation that simultaneously summarizes key characteristics of the data and enables dimension reduction for data compression and storage. Low-rank representations such as singular value decomposition factor the original data into the product of orthonormal basis functions and weights, where each basis function represents an independent feature of the data. However, the basis functions in these factorizations are typically computed using algorithmic methods that cannot quantify uncertainty or account for explicit structure beyond what is implicitly specified via data correlation. We propose a flexible prior distribution for orthonormal matrices that can explicitly model structure in the basis functions. The prior is used within a general probabilistic model for singular value decomposition to conduct posterior inference on the basis functions while accounting for measurement error and fixed effects. To contextualize the proposed prior and model, we discuss how the prior specification can be used for various scenarios and relate the model to its deterministic counterpart. We demonstrate favorable model properties through synthetic data examples and apply our method to sea surface temperature data from the northern Pacific, enhancing our understanding of the ocean's internal variability.
翻译:科学技术大数据时代,促使人们利用低秩表示对矩阵值数据进行统计建模,这种表示既能同时概括数据的关键特征,又能通过降维实现数据压缩与存储。诸如奇异值分解等低秩表示方法将原始数据分解为正交基函数与权重的乘积,其中每个基函数代表数据的一个独立特征。然而,这些分解中的基函数通常采用算法方法计算,无法量化不确定性,也无法显式建模超出数据相关性隐式指定的结构化信息。我们提出了一种适用于正交矩阵的灵活先验分布,能够显式建模基函数中的结构特征。该先验被嵌入到一个用于奇异值分解的通用概率模型中,从而在考虑测量误差与固定效应的同时,对基函数进行后验推断。为阐明所提先验与模型的实际意义,我们讨论了该先验设定如何适用于多种场景,并将该模型与其确定性对应模型进行比较。通过合成数据示例验证了模型优越性能,并将该方法应用于北太平洋海面温度数据,从而加深了对海洋内部变异性的理解。