We present AstroCLIP, a strategy to facilitate the construction of astronomical foundation models that bridge the gap between diverse observational modalities. We demonstrate that a cross-modal contrastive learning approach between images and optical spectra of galaxies yields highly informative embeddings of both modalities. In particular, we apply our method on multi-band images and optical spectra from the Dark Energy Spectroscopic Instrument (DESI), and show that: (1) these embeddings are well-aligned between modalities and can be used for accurate cross-modal searches, and (2) these embeddings encode valuable physical information about the galaxies -- in particular redshift and stellar mass -- that can be used to achieve competitive zero- and few- shot predictions without further finetuning. Additionally, in the process of developing our approach, we also construct a novel, transformer-based model and pretraining approach for processing galaxy spectra.
翻译:我们提出了AstroCLIP,一种有助于构建跨不同观测模态的天文基础模型的策略。我们证明了在星系图像与光学光谱之间采用跨模态对比学习方法,能够生成两者高度信息化的嵌入表示。具体而言,我们将该方法应用于暗能量光谱仪(DESI)的多波段图像和光学光谱,并表明:(1)这些嵌入在模态之间实现良好对齐,可用于精确的跨模态搜索;(2)这些嵌入编码了星系的重要物理信息——特别是红移和恒星质量——无需进一步微调即可实现具有竞争力的零样本和少样本预测。此外,在方法开发过程中,我们还构建了一种基于Transformer的新型模型及预训练方法,用于处理星系光谱。