Recent advancements in areas such as natural language processing and computer vision rely on intricate and massive models that have been trained using vast amounts of unlabelled or partly labeled data and training or deploying these state-of-the-art methods to resource constraint environments has been a challenge. Galaxy morphologies are crucial to understanding the processes by which galaxies form and evolve. Efficient methods to classify galaxy morphologies are required to extract physical information from modern-day astronomy surveys. In this paper, we introduce Astroformer, a method to learn from less amount of data. We propose using a hybrid transformer-convolutional architecture drawing much inspiration from the success of CoAtNet and MaxViT. Concretely, we use the transformer-convolutional hybrid with a new stack design for the network, a different way of creating a relative self-attention layer, and pair it with a careful selection of data augmentation and regularization techniques. Our approach sets a new state-of-the-art on predicting galaxy morphologies from images on the Galaxy10 DECals dataset, a science objective, which consists of 17736 labeled images achieving 94.86% top-$1$ accuracy, beating the current state-of-the-art for this task by 4.62%. Furthermore, this approach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. We also find that models and training methods used for larger datasets would often not work very well in the low-data regime.
翻译:近期自然语言处理与计算机视觉等领域的进展依赖于复杂且庞大的模型,这些模型在海量无标签或部分标签数据上训练而成,而在资源受限环境中训练或部署这些先进方法仍面临挑战。星系形态学对于理解星系的形成与演化过程至关重要,因此需要高效的方法从现代天文巡天数据中提取物理信息。本文提出Astroformer——一种能够从较少数据中学习的方法。受CoAtNet与MaxViT成功的启发,我们采用混合Transformer-卷积架构,具体包括:采用新型堆叠设计的Transformer-卷积混合网络、创新的相对自注意力层构建方式,并配合精心挑选的数据增强与正则化技术。本方法在包含17736张标注图像的Galaxy10 DECals数据集上,以94.86%的top-1准确率刷新了星系形态分类任务的最优结果,较此前最优方法提升4.62%。此外,该方法在CIFAR-100和Tiny ImageNet数据集上也创下新纪录。我们还发现,适用于大型数据集的模型与训练方法在低数据场景下往往效果不佳。