Non-Gaussian statistics are a challenge for data assimilation. Linear methods oversimplify the problem, yet fully nonlinear methods are often too expensive to use in practice. The best solution usually lies between these extremes. Triangular measure transport offers a flexible framework for nonlinear data assimilation. Its success, however, depends on how the map is parametrized. Too much flexibility leads to overfitting; too little misses important structure. To address this balance, we develop an adaptation algorithm that selects a parsimonious parametrization automatically. Our method uses P-spline basis functions and an information criterion as a continuous measure of model complexity. This formulation enables gradient descent and allows efficient, fine-scale adaptation in high-dimensional settings. The resulting algorithm requires no hyperparameter tuning. It adjusts the transport map to the appropriate level of complexity based on the system statistics and ensemble size. We demonstrate its performance in nonlinear, non-Gaussian problems, including a high-dimensional distributed groundwater model.
翻译:非高斯统计特性给数据同化带来了挑战。线性方法过度简化问题,而完全非线性方法在实际应用中往往过于昂贵。最佳解决方案通常介于这两个极端之间。三角测度传输为非线性数据同化提供了一个灵活框架,但其成功与否取决于映射的参数化方式。参数化过于灵活会导致过拟合;过于严格则会遗漏重要结构。为解决这一平衡问题,我们开发了一种自适应算法,能够自动选择简约的参数化方案。该方法采用P样条基函数,并以信息准则作为模型复杂度的连续度量。这样的设定使得梯度下降可行,并能够在高维场景下实现高效、精细尺度的自适应。最终算法无需超参数调优,可根据系统统计特性和集合规模将传输映射调整至恰当的复杂度水平。我们通过非线性、非高斯问题(包括一个高维分布式地下水模型)验证了其性能。