We train deep learning models on thousands of galaxy catalogues from the state-of-the-art hydrodynamic simulations of the CAMELS project to perform regression and inference. We employ Graph Neural Networks (GNNs), architectures designed to work with irregular and sparse data, like the distribution of galaxies in the Universe. We first show that GNNs can learn to compute the power spectrum of galaxy catalogues with a few percent accuracy. We then train GNNs to perform likelihood-free inference at the galaxy-field level. Our models are able to infer the value of $\Omega_{\rm m}$ with a $\sim12\%-13\%$ accuracy just from the positions of $\sim1000$ galaxies in a volume of $(25~h^{-1}{\rm Mpc})^3$ at $z=0$ while accounting for astrophysical uncertainties as modelled in CAMELS. Incorporating information from galaxy properties, such as stellar mass, stellar metallicity, and stellar radius, increases the accuracy to $4\%-8\%$. Our models are built to be translational and rotational invariant, and they can extract information from any scale larger than the minimum distance between two galaxies. However, our models are not completely robust: testing on simulations run with a different subgrid physics than the ones used for training does not yield as accurate results.
翻译:我们在CAMELS项目最先进的流体动力学模拟中,对数千个星系目录训练深度学习模型,以执行回归和推断。我们采用图神经网络(GNNs)——专为处理不规则和稀疏数据(如宇宙中星系的分布)而设计的架构。首先,我们证明GNN能以百分之几的精度学习计算星系目录的功率谱。随后,我们训练GNN在星系场层面进行无似然推断。我们的模型仅根据红移$z=0$时体积为$(25~h^{-1}{\rm Mpc})^3$的约1000个星系的位置推断$\Omega_{\rm m}$,精度达$\sim12\%-13\%$,同时考虑了CAMELS模型中建模的天体物理不确定性。若纳入星系属性信息(如恒星质量、恒星金属丰度和恒星半径),精度可提升至$4\%-8\%$。我们的模型具有平移和旋转不变性,能够从大于两个星系间最小距离的任何尺度提取信息。然而,这些模型并非完全鲁棒:在采用与训练集不同的子网格物理参数进行模拟测试时,结果并未达到同等精度。