StarEmbed: Benchmarking Time Series Foundation Models on Astronomical Observations of Variable Stars

Time series foundation models (TSFMs) are increasingly being adopted as highly-capable general-purpose time series representation learners. Although their training corpora are vast, they exclude astronomical time series data. Observations of stars produce peta-scale time series with unique challenges including irregular sampling and heteroskedasticity. We introduce StarEmbed, the first public benchmark for rigorous and standardized evaluation of state-of-the-art TSFMs on stellar time series observations (``light curves''). We benchmark on three scientifically-motivated downstream tasks: unsupervised clustering, supervised classification, and out-of-distribution source detection. StarEmbed integrates a catalog of expert-vetted labels with multi-variate light curves from the Zwicky Transient Facility, yielding ~40k hand-labeled light curves spread across seven astrophysical classes. We evaluate the zero-shot representation capabilities of three TSFMs (MOIRAI, Chronos, Chronos-Bolt) and a domain-specific transformer (Astromer) against handcrafted feature extraction, the long-standing baseline in the astrophysics literature. Our results demonstrate that these TSFMs, especially the Chronos models, which are trained on data completely unlike the astronomical observations, can outperform established astrophysics-specific baselines in some tasks and effectively generalize to entirely new data. In particular, TSFMs deliver state-of-the-art performance on our out-of-distribution source detection benchmark. With the first benchmark of TSFMs on astronomical time series data, we test the limits of their generalization and motivate a paradigm shift in time-domain astronomy from using task-specific, fully supervised pipelines toward adopting generic foundation model representations for the analysis of peta-scale datasets from forthcoming observatories.

翻译：时间序列基础模型（TSFMs）正日益被用作高性能的通用时间序列表示学习器。尽管其训练语料库规模庞大，但它们并未包含天文时间序列数据。恒星观测产生具有独特挑战的拍字节级时间序列，包括不规则采样和异方差性。我们提出了StarEmbed，这是首个用于对恒星时间序列观测（“光变曲线”）上的先进TSFMs进行严格标准化评估的公开基准。我们在三个科学驱动的下游任务上进行基准测试：无监督聚类、监督分类和分布外源检测。StarEmbed整合了来自兹威基瞬变设施的多变量光变曲线与专家审核的标签目录，生成了约4万条人工标记的光变曲线，涵盖七个天体物理类别。我们评估了三种TSFM（MOIRAI、Chronos、Chronos-Bolt）和一个领域专用Transformer（Astromer）的零样本表示能力，并与天体物理学文献中长期使用的手工特征提取基线进行了对比。我们的结果表明，这些TSFMs（尤其是Chronos模型，其训练数据与天文观测数据完全不同）在某些任务中能够超越已建立的天体物理专用基线，并有效地泛化到全新的数据。特别地，TSFMs在我们的分布外源检测基准上实现了最先进的性能。通过对天文时间序列数据的首次TSFMs基准测试，我们检验了其泛化能力的极限，并推动时域天文学从使用任务特定的全监督流程，转向采用通用基础模型表示来分析未来天文台产生的拍字节级数据集。