AstroM$^3$: A self-supervised multimodal model for astronomy

While machine-learned models are now routinely employed to facilitate astronomical inquiry, model inputs tend to be limited to a primary data source (namely images or time series) and, in the more advanced approaches, some metadata. Yet with the growing use of wide-field, multiplexed observational resources, individual sources of interest often have a broad range of observational modes available. Here we construct an astronomical multimodal dataset and propose AstroM$^3$, a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously. Specifically, we extend the CLIP (Contrastive Language-Image Pretraining) model to a trimodal setting, allowing the integration of time-series photometry data, spectra, and astrophysical metadata. In a fine-tuning supervised setting, our results demonstrate that CLIP pre-training improves classification performance for time-series photometry, where accuracy increases from 84.6% to 91.5%. Furthermore, CLIP boosts classification accuracy by up to 12.6% when the availability of labeled data is limited, showing the effectiveness of leveraging larger corpora of unlabeled data. In addition to fine-tuned classification, we can use the trained model in other downstream tasks that are not explicitly contemplated during the construction of the self-supervised model. In particular we show the efficacy of using the learned embeddings for misclassifications identification, similarity search, and anomaly detection. One surprising highlight is the "rediscovery" of Mira subtypes and two Rotational variable subclasses using manifold learning and dimension reduction algorithm. To our knowledge this is the first construction of an $n>2$ mode model in astronomy. Extensions to $n>3$ modes is naturally anticipated with this approach.

翻译：尽管机器学习模型如今已常规用于辅助天文学研究，但模型的输入往往局限于单一主要数据源（即图像或时间序列），在更先进的方法中，可能包含一些元数据。然而，随着广视场、多路复用观测资源的日益普及，感兴趣的天体通常拥有多种可用的观测模式。本文构建了一个天文多模态数据集，并提出了AstroM$^3$——一种自监督预训练方法，使模型能够同时从多种模态中学习。具体而言，我们将CLIP（对比语言-图像预训练）模型扩展至三模态场景，实现了对时序测光数据、光谱及天体物理元数据的整合。在微调监督场景下，我们的结果表明，CLIP预训练提升了时序测光数据的分类性能，准确率从84.6%提高至91.5%。此外，在标注数据有限的情况下，CLIP将分类准确率最高提升了12.6%，证明了利用大规模无标注语料库的有效性。除了微调分类任务，训练后的模型还可用于其他下游任务，这些任务在构建自监督模型时并未明确设定。我们特别展示了所学嵌入在误分类识别、相似性搜索和异常检测中的应用效果。一个令人惊喜的亮点是，通过流形学习和降维算法“重新发现”了米拉变星的子类型及两个旋转变星子类。据我们所知，这是天文学领域首次构建$n>2$模态的模型。该方法自然可扩展至$n>3$模态的应用场景。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日