Statistical modeling of high-dimensional matrix-valued data motivates the use of a low-rank representation that simultaneously summarizes key characteristics of the data and enables dimension reduction. Low-rank representations commonly factor the original data into the product of orthonormal basis functions and weights, where each basis function represents an independent feature of the data. However, the basis functions in these factorizations are typically computed using algorithmic methods that cannot quantify uncertainty or account for basis function correlation structure a priori. While there exist Bayesian methods that allow for a common correlation structure across basis functions, empirical examples motivate the need for basis function-specific dependence structure. We propose a prior distribution for orthonormal matrices that can explicitly model basis function-specific structure. The prior is used within a general probabilistic model for singular value decomposition to conduct posterior inference on the basis functions while accounting for measurement error and fixed effects. We discuss how the prior specification can be used for various scenarios and demonstrate favorable model properties through synthetic data examples. Finally, we apply our method to two-meter air temperature data from the Pacific Northwest, enhancing our understanding of the Earth system's internal variability.
翻译:高维矩阵值数据的统计建模促使采用低秩表示,该表示既能总结数据的关键特征,又能实现降维。低秩表示通常将原始数据分解为正交基函数与权重的乘积,其中每个基函数代表数据的一个独立特征。然而,这些分解中的基函数通常通过算法方法计算,这些方法无法先验地量化不确定性或考虑基函数间的相关结构。虽然存在允许基函数间具有共同相关结构的贝叶斯方法,但实证案例表明需要基函数特定的依赖结构。我们提出了一种正交矩阵的先验分布,能够显式建模基函数特定的结构。该先验被用于奇异值分解的一般概率模型中,在考虑测量误差和固定效应的同时对基函数进行后验推断。我们讨论了该先验设定如何适用于不同场景,并通过合成数据示例展示了模型的优越特性。最后,我们将该方法应用于太平洋西北地区的两米气温数据,增进了对地球系统内部变异性的理解。