Deep learning models have been shown to outperform methods that rely on summary statistics, like the power spectrum, in extracting information from complex cosmological data sets. However, due to differences in the subgrid physics implementation and numerical approximations across different simulation suites, models trained on data from one cosmological simulation show a drop in performance when tested on another. Similarly, models trained on any of the simulations would also likely experience a drop in performance when applied to observational data. Training on data from two different suites of the CAMELS hydrodynamic cosmological simulations, we examine the generalization capabilities of Domain Adaptive Graph Neural Networks (DA-GNNs). By utilizing GNNs, we capitalize on their capacity to capture structured scale-free cosmological information from galaxy distributions. Moreover, by including unsupervised domain adaptation via Maximum Mean Discrepancy (MMD), we enable our models to extract domain-invariant features. We demonstrate that DA-GNN achieves higher accuracy and robustness on cross-dataset tasks (up to $28\%$ better relative error and up to almost an order of magnitude better $\chi^2$). Using data visualizations, we show the effects of domain adaptation on proper latent space data alignment. This shows that DA-GNNs are a promising method for extracting domain-independent cosmological information, a vital step toward robust deep learning for real cosmic survey data.
翻译:深度学习模型已被证明在从复杂宇宙学数据集中提取信息方面优于依赖功率谱等汇总统计量的方法。然而,由于不同模拟套件中次网格物理实现和数值近似方法的差异,基于某一宇宙学模拟数据训练的模型在另一模拟上测试时性能会下降。类似地,基于任意模拟训练的模型应用于观测数据时也可能出现性能下降。通过使用CAMELS流体动力学宇宙学模拟的两套不同数据训练,我们研究了域自适应图神经网络(DA-GNNs)的泛化能力。利用GNNs,我们充分发挥其从星系分布中捕获无标度结构宇宙学信息的能力。此外,通过引入基于最大均值差异(MMD)的无监督域自适应,我们使模型能够提取域不变特征。实验表明,DA-GNN在跨数据集任务中实现了更高的精度和鲁棒性(相对误差改进高达$28\%$,$\chi^2$改进近一个数量级)。通过数据可视化,我们展示了域自适应对潜在空间数据对齐的作用。这表明DA-GNNs是提取域无关宇宙学信息的一种有前景的方法,是实现真实宇宙巡天数据鲁棒深度学习的必要步骤。