The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers based on convolutional neural networks. Alternatively, this work proposes gradient boosting machine learning methods accompanied by principal component analysis as data-efficient alternatives to convolutional neural networks. Recent findings have shown the efficacy of gradient boosting methods in outperforming deep learning methods for classification problems with tabular data. The gradient boosting methods considered in this work are based on the XGBoost, LightGBM, and CatBoost implementations. This work also studies the effect of dataset size on classifier performance. A three-class classification problem is considered in this work based on the three main Fanaroff-Riley classes: class 0, class I, and class II, using radio sources from the Best-Heckman sample. All three proposed gradient boosting methods outperformed a state-of-the-art convolutional neural networks-based classifier using less than a quarter of the number of images, with CatBoost having the highest accuracy. This was mainly due to the superior accuracy of gradient boosting methods in classifying Fanaroff-Riley class II sources, with 3--4\% higher recall.
翻译:射电天文学领域正经历着因新射电望远镜启用而产生的每日数据量激增现象。该领域最关键问题之一是基于形态特征对河外射电源进行自动分类。近期河外射电源形态分类研究中,多数工作提出了基于卷积神经网络的分类器。本研究则另辟蹊径,提出结合主成分分析的梯度提升机器学习方法,作为卷积神经网络的数据高效替代方案。最新研究表明,对于表格数据分类问题,梯度提升方法在性能上可超越深度学习方法。本研究采用的梯度提升方法基于XGBoost、LightGBM和CatBoost实现,同时分析了数据集规模对分类器性能的影响。基于Best-Heckman样本中的射电源数据,本研究聚焦于三类Fanaroff-Riley分类问题:Class 0、Class I和Class II。三种梯度提升方法均以少于四分之一数量的图像数据超越了基于卷积神经网络的先进分类器,其中CatBoost达到最高准确率。这一优势主要源于梯度提升方法在识别Fanaroff-Riley II类射电源时表现出卓越精度,其召回率提升了3%-4%。