The Mat\'ern family of covariance functions is currently the most popularly used model in spatial statistics, geostatistics, and machine learning to specify the correlation between two geographical locations based on spatial distance. Compared to existing covariance functions, the Mat\'ern family has more flexibility in data fitting because it allows the control of the field smoothness through a dedicated parameter. Moreover, it generalizes other popular covariance functions. However, fitting the smoothness parameter is computationally challenging since it complicates the optimization process. As a result, some practitioners set the smoothness parameter at an arbitrary value to reduce the optimization convergence time. In the literature, studies have used various parameterizations of the Mat\'ern covariance function, assuming they are equivalent. This work aims at studying the effectiveness of different parameterizations under various settings. We demonstrate the feasibility of inferring all parameters simultaneously and quantifying their uncertainties on large-scale data using the ExaGeoStat parallel software. We also highlight the importance of the smoothness parameter by analyzing the Fisher information of the statistical parameters. We show that the various parameterizations have different properties and differ from several perspectives. In particular, we study the three most popular parameterizations in terms of parameter estimation accuracy, modeling accuracy and efficiency, prediction efficiency, uncertainty quantification, and asymptotic properties. We further demonstrate their differing performances under nugget effects and approximated covariance. Lastly, we give recommendations for parameterization selection based on our experimental results.
翻译:Matérn协方差函数族是目前空间统计、地质统计学及机器学习中最常用的模型,用于基于空间距离指定两个地理位置之间的相关性。相较于现有协方差函数,Matérn族通过专用参数控制场平滑度,在数据拟合中具有更高灵活性,且可推广其他常用协方差函数。然而,拟合平滑度参数因复杂化优化过程而面临计算挑战。实践中,部分研究者为缩短优化收敛时间而将平滑度参数设为任意值。文献中虽采用多种Matérn协方差函数参数化形式,但默认其具有等价性。本研究旨在探究不同参数化形式在多种场景下的有效性:通过ExaGeoStat并行软件,我们证明了在大规模数据上同时推断所有参数并量化其不确定性的可行性;通过分析统计参数的Fisher信息,揭示了平滑度参数的关键作用。研究显示,不同参数化形式在参数估计精度、建模准确性与效率、预测效率、不确定性量化及渐近性质等方面存在本质差异。我们重点对比了三种主流参数化形式在上述维度的性能,并在金块效应与近似协方差条件下验证其表现差异。最终基于实验结果提出参数化选择建议。