While machine-learned models are now routinely employed to facilitate astronomical inquiry, model inputs tend to be limited to a primary data source (namely images or time series) and, in the more advanced approaches, some metadata. Yet with the growing use of wide-field, multiplexed observational resources, individual sources of interest often have a broad range of observational modes available. Here we construct an astronomical multimodal dataset and propose AstroM$^3$, a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously. Specifically, we extend the CLIP (Contrastive Language-Image Pretraining) model to a trimodal setting, allowing the integration of time-series photometry data, spectra, and astrophysical metadata. In a fine-tuning supervised setting, our results demonstrate that CLIP pre-training improves classification performance for time-series photometry, where accuracy increases from 84.6% to 91.5%. Furthermore, CLIP boosts classification accuracy by up to 12.6% when the availability of labeled data is limited, showing the effectiveness of leveraging larger corpora of unlabeled data. In addition to fine-tuned classification, we can use the trained model in other downstream tasks that are not explicitly contemplated during the construction of the self-supervised model. In particular we show the efficacy of using the learned embeddings for misclassifications identification, similarity search, and anomaly detection. One surprising highlight is the "rediscovery" of Mira subtypes and two Rotational variable subclasses using manifold learning and dimension reduction algorithm. To our knowledge this is the first construction of an $n>2$ mode model in astronomy. Extensions to $n>3$ modes is naturally anticipated with this approach.
翻译:尽管机器学习模型如今已常规用于辅助天文学研究,但模型的输入往往局限于单一主要数据源(即图像或时间序列),在更先进的方法中,可能包含一些元数据。然而,随着广视场、多路复用观测资源的日益普及,感兴趣的天体通常拥有多种可用的观测模式。本文构建了一个天文多模态数据集,并提出了AstroM$^3$——一种自监督预训练方法,使模型能够同时从多种模态中学习。具体而言,我们将CLIP(对比语言-图像预训练)模型扩展至三模态场景,实现了对时序测光数据、光谱及天体物理元数据的整合。在微调监督场景下,我们的结果表明,CLIP预训练提升了时序测光数据的分类性能,准确率从84.6%提高至91.5%。此外,在标注数据有限的情况下,CLIP将分类准确率最高提升了12.6%,证明了利用大规模无标注语料库的有效性。除了微调分类任务,训练后的模型还可用于其他下游任务,这些任务在构建自监督模型时并未明确设定。我们特别展示了所学嵌入在误分类识别、相似性搜索和异常检测中的应用效果。一个令人惊喜的亮点是,通过流形学习和降维算法“重新发现”了米拉变星的子类型及两个旋转变星子类。据我们所知,这是天文学领域首次构建$n>2$模态的模型。该方法自然可扩展至$n>3$模态的应用场景。