Space-based monitoring of atmospheric carbon dioxide (CO2) is essential for constraining the global carbon budget. NASA's Orbiting Carbon Observatory-2 (OCO-2) estimates column-averaged dry-air mole fractions of CO2 (XCO2) using high-resolution spectra. However, current operational retrieval algorithms are computationally expensive and do not properly quantify uncertainties. We present a novel deep learning framework that addresses these challenges. Due to the difficulties of ground-truth data for real satellite observations, we develop and validate our approach using a high-fidelity simulation dataset. This dataset, created to support OCO-2 uncertainty quantification (UQ), incorporates realistic forward model errors. Our architecture encodes spectral bands using a multi-branch neural network and estimates posteriors of the full CO2 column or desired summaries thereof using two scalable UQ methods: Laplace approximations and normalizing flows. Our approach has five key advantages relative to operational "full-physics" solvers: (1) Amortization: Inference is orders of magnitude faster, enabling real-time processing of massive data streams; (2) Model error robustness: By training on simulations that explicitly include model discrepancies, our method accounts for systematic errors often neglected by standard inversions; (3) Point estimate accuracy: We achieve superior predictive accuracy compared to baseline methods; (4) Improved UQ: The probabilistic outputs yield better-calibrated uncertainty estimates; and (5) Non-Gaussian posteriors: When utilizing normalizing flows, our framework successfully models complex, asymmetric posterior distributions, overcoming the limitations of the Gaussian assumption. These results suggest that simulation-based deep learning is a viable path toward next-generation operational processing systems.
翻译:基于空间平台的大气二氧化碳(CO2)监测对于约束全球碳收支至关重要。美国国家航空航天局(NASA)的轨道碳观测者2号(OCO-2)利用高分辨率光谱估算柱平均干空气CO2摩尔分数(XCO2)。然而,当前业务化反演算法计算成本高昂,且无法恰当量化不确定性。我们提出一种新型深度学习框架以应对这些挑战。鉴于真实卫星观测难以获取地面真值数据,我们基于高保真模拟数据集开发并验证了该方法。该数据集旨在支持OCO-2的不确定性量化(UQ),并融入了具有实际意义的前向模型误差。我们的架构采用多分支神经网络编码光谱波段,并通过两种可扩展的UQ方法——拉普拉斯近似与归一化流——来估计完整CO2柱浓度及其所需汇总统计量的后验分布。与业务化"全物理"求解器相比,我们的方法具有五大关键优势:(1)摊销效应:推理速度提升数个数量级,可实现海量数据流的实时处理;(2)模型误差鲁棒性:通过在显式包含模型偏差的模拟数据上训练,该方法能够捕捉标准反演常忽略的系统误差;(3)点估计精度:与基线方法相比,我们实现了更优的预测准确度;(4)改进的UQ:概率性输出提供了校准更优的不确定性估计;(5)非高斯后验:当采用归一化流时,该框架成功建模了复杂非对称的后验分布,突破了高斯假设的局限性。这些结果表明,基于模拟的深度学习是迈向下一代业务化处理系统的可行路径。