COP-GEN: Latent Diffusion Transformer for Copernicus Earth Observation Data

Earth observation applications increasingly rely on data from multiple sensors, including optical, radar, elevation, and land-cover. Relationships between modalities are fundamental for data integration but are inherently non-injective: identical conditioning information can correspond to multiple physically plausible observations, and should be parametrised as conditional distributions. Deterministic models, by contrast, collapse toward conditional means and fail to represent the uncertainty and variability required for tasks such as data completion and cross-sensor translation. We introduce COP-GEN, a multimodal latent diffusion transformer that models the joint distribution of heterogeneous EO modalities at their native spatial resolutions. By parameterising cross-modal mappings as conditional distributions, COP-GEN enables flexible any-to-any conditional generation, including zero-shot modality translation without task-specific retraining. Experiments show that COP-GEN generates diverse yet physically consistent realisations while maintaining strong peak fidelity across optical, radar, and elevation modalities. Qualitative and quantitative analyses demonstrate that the model captures meaningful cross-modal structure and adapts its output uncertainty as conditioning information increases. We release a stochastic benchmark built from multi-temporal Sentinel-2 observations that enables distribution-level comparison of generative EO models. On this benchmark, COP-GEN covers 90% of the real observation manifold and 63% of its per-band reflectance range, while the strongest competing method collapses to 2.8% and 18%, respectively. These results highlight the importance of stochastic generative modeling for EO and motivate evaluation protocols beyond single-reference, pointwise metrics. Website: https://miquel-espinosa.github.io/cop-gen

翻译：地球观测应用日益依赖来自光学、雷达、高程和土地覆盖等多传感器的数据。不同模态之间的关系是数据整合的基础，但本质上是非单射的：相同的条件信息可能对应多个物理上合理的观测结果，因此应参数化为条件分布。相比之下，确定性模型退化为条件均值，无法表达数据补全与跨传感器翻译等任务所需的不确定性和变异性。我们提出COP-GEN，一种多模态潜在扩散变压器，用于对异构地球观测模态在其原始空间分辨率上的联合分布建模。通过将跨模态映射参数化为条件分布，COP-GEN支持灵活的任意到任意条件生成，包括无需针对特定任务重新训练的零样本模态翻译。实验表明，COP-GEN能生成多样且物理一致的结果，同时在光学、雷达和高程模态上保持较高的峰值保真度。定性与定量分析显示，该模型能捕捉有意义的跨模态结构，并随条件信息增加自适应调整输出不确定性。我们发布了一个基于多时相Sentinel-2观测数据构建的随机基准，支持对地球观测生成模型进行分布级别的比较。在该基准上，COP-GEN覆盖了真实观测流形的90%及单波段反射率范围的63%，而最强竞争方法分别仅覆盖2.8%和18%。这些结果凸显了随机生成建模对地球观测的重要性，并推动了超越单参考逐点度量的评估协议。网址：https://miquel-espinosa.github.io/cop-gen