The growing number of exoplanet discoveries and advances in machine learning techniques have opened new avenues for exploring and understanding the characteristics of worlds beyond our Solar System. In this study, we employ efficient machine learning approaches to analyze a dataset comprising 762 confirmed exoplanets and eight Solar System planets, aiming to characterize their fundamental quantities. By applying different unsupervised clustering algorithms, we classify the data into two main classes: 'small' and 'giant' planets, with cut-off values at $R_{p}=8.13R_{\oplus}$ and $M_{p}=52.48M_{\oplus}$. This classification reveals an intriguing distinction: giant planets have lower densities, suggesting higher H-He mass fractions, while small planets are denser, composed mainly of heavier elements. We apply various regression models to uncover correlations between physical parameters and their predictive power for exoplanet radius. Our analysis highlights that planetary mass, orbital period, and stellar mass play crucial roles in predicting exoplanet radius. Among the models evaluated, the Support Vector Regression consistently outperforms others, demonstrating its promise for obtaining accurate planetary radius estimates. Furthermore, we derive parametric equations using the M5P and Markov Chain Monte Carlo methods. Notably, our study reveals a noteworthy result: small planets exhibit a positive linear mass-radius relation, aligning with previous findings. Conversely, for giant planets, we observe a strong correlation between planetary radius and the mass of their host stars, which might provide intriguing insights into the relationship between giant planet formation and stellar characteristics.
翻译:随着系外行星发现的日益增多以及机器学习技术的进步,为探索和理解太阳系外世界的特征开辟了新途径。本研究采用高效的机器学习方法,对包含762颗已确认系外行星和八颗太阳系行星的数据集进行分析,旨在表征其基本物理量。通过应用不同的无监督聚类算法,我们将数据分为两类:“小型”和“巨型”行星,其分界值分别为$R_{p}=8.13R_{\oplus}$和$M_{p}=52.48M_{\oplus}$。该分类揭示了一个有趣的区别:巨型行星密度较低,表明其H-He质量分数较高,而小型行星密度较大,主要由重元素组成。我们应用多种回归模型来揭示物理参数之间的相关性及其对系外行星半径的预测能力。分析强调,行星质量、轨道周期和恒星质量在预测系外行星半径中起着关键作用。在所评估的模型中,支持向量回归模型表现始终优于其他模型,展现了其在获取精确行星半径估计方面的潜力。此外,我们利用M5P算法和马尔可夫链蒙特卡洛方法推导出参数化方程。值得注意的是,我们的研究揭示了一个重要结果:小型行星呈现出正向线性质-径关系,这与先前发现一致。相反,对于巨型行星,我们观察到行星半径与其宿主恒星质量之间存在强相关性,这可能为理解巨型行星形成与恒星特征之间的关系提供有趣见解。