MiraBest: A Dataset of Morphologically Classified Radio Galaxies for Machine Learning

The volume of data from current and future observatories has motivated the increased development and application of automated machine learning methodologies for astronomy. However, less attention has been given to the production of standardised datasets for assessing the performance of different machine learning algorithms within astronomy and astrophysics. Here we describe in detail the MiraBest dataset, a publicly available batched dataset of 1256 radio-loud AGN from NVSS and FIRST, filtered to $0.03 < z < 0.1$, manually labelled by Miraghaei and Best (2017) according to the Fanaroff-Riley morphological classification, created for machine learning applications and compatible for use with standard deep learning libraries. We outline the principles underlying the construction of the dataset, the sample selection and pre-processing methodology, dataset structure and composition, as well as a comparison of MiraBest to other datasets used in the literature. Existing applications that utilise the MiraBest dataset are reviewed, and an extended dataset of 2100 sources is created by cross-matching MiraBest with other catalogues of radio-loud AGN that have been used more widely in the literature for machine learning applications.

翻译：当前及未来天文台产生的海量数据推动了自动化机器学习方法在天文学中的开发与应用。然而，在天文学与天体物理学领域，用于评估不同机器学习算法性能的标准化数据集构建工作尚未得到足够重视。本文详细介绍了MiraBest数据集——一个从NVSS和FIRST巡天中筛选出的1256个射电噪活动星系核（AGN）的公开批次数据集，其红移范围限定为$0.03 < z < 0.1$，并由Miraghaei和Best（2017）依据Fanaroff-Riley形态分类进行人工标注。该数据集专为机器学习应用构建，可与标准深度学习库兼容使用。我们阐述了数据集构建的基本原则、样本选取与预处理方法、数据结构与组成，并比较了MiraBest与文献中其他数据集的差异。本文还回顾了现有利用MiraBest数据集的应用研究，并通过将MiraBest与文献中更广泛用于机器学习应用的射电噪AGN其他星表进行交叉匹配，生成了一个包含2100个源的扩展数据集。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

专知会员服务

39+阅读 · 2020年11月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation