Amortized Probabilistic Retrieval of Atmospheric CO2 from OCO-2 Spectra Using Deep Learning with Laplace Approximations and Normalizing Flows

Space-based monitoring of atmospheric carbon dioxide (CO2) is essential for constraining the global carbon budget. NASA's Orbiting Carbon Observatory-2 (OCO-2) estimates column-averaged dry-air mole fractions of CO2 (XCO2) using high-resolution spectra. However, current operational retrieval algorithms are computationally expensive and do not properly quantify uncertainties. We present a novel deep learning framework that addresses these challenges. Due to the difficulties of ground-truth data for real satellite observations, we develop and validate our approach using a high-fidelity simulation dataset. This dataset, created to support OCO-2 uncertainty quantification (UQ), incorporates realistic forward model errors. Our architecture encodes spectral bands using a multi-branch neural network and estimates posteriors of the full CO2 column or desired summaries thereof using two scalable UQ methods: Laplace approximations and normalizing flows. Our approach has five key advantages relative to operational "full-physics" solvers: (1) Amortization: Inference is orders of magnitude faster, enabling real-time processing of massive data streams; (2) Model error robustness: By training on simulations that explicitly include model discrepancies, our method accounts for systematic errors often neglected by standard inversions; (3) Point estimate accuracy: We achieve superior predictive accuracy compared to baseline methods; (4) Improved UQ: The probabilistic outputs yield better-calibrated uncertainty estimates; and (5) Non-Gaussian posteriors: When utilizing normalizing flows, our framework successfully models complex, asymmetric posterior distributions, overcoming the limitations of the Gaussian assumption. These results suggest that simulation-based deep learning is a viable path toward next-generation operational processing systems.

翻译：基于空间平台的大气二氧化碳（CO2）监测对于约束全球碳收支至关重要。美国国家航空航天局（NASA）的轨道碳观测者2号（OCO-2）利用高分辨率光谱估算柱平均干空气CO2摩尔分数（XCO2）。然而，当前业务化反演算法计算成本高昂，且无法恰当量化不确定性。我们提出一种新型深度学习框架以应对这些挑战。鉴于真实卫星观测难以获取地面真值数据，我们基于高保真模拟数据集开发并验证了该方法。该数据集旨在支持OCO-2的不确定性量化（UQ），并融入了具有实际意义的前向模型误差。我们的架构采用多分支神经网络编码光谱波段，并通过两种可扩展的UQ方法——拉普拉斯近似与归一化流——来估计完整CO2柱浓度及其所需汇总统计量的后验分布。与业务化"全物理"求解器相比，我们的方法具有五大关键优势：（1）摊销效应：推理速度提升数个数量级，可实现海量数据流的实时处理；（2）模型误差鲁棒性：通过在显式包含模型偏差的模拟数据上训练，该方法能够捕捉标准反演常忽略的系统误差；（3）点估计精度：与基线方法相比，我们实现了更优的预测准确度；（4）改进的UQ：概率性输出提供了校准更优的不确定性估计；（5）非高斯后验：当采用归一化流时，该框架成功建模了复杂非对称的后验分布，突破了高斯假设的局限性。这些结果表明，基于模拟的深度学习是迈向下一代业务化处理系统的可行路径。