Astronomers often deal with data where the covariates and the dependent variable are measured with heteroscedastic non-Gaussian error. For instance, while TESS and Kepler datasets provide a wealth of information, addressing the challenges of measurement errors and systematic biases is critical for extracting reliable scientific insights and improving machine learning models' performance. Although techniques have been developed for estimating regression parameters for these data, few techniques exist to construct prediction intervals with finite sample coverage guarantees. To address this issue, we tailor the conformal prediction approach to our application. We empirically demonstrate that this method gives finite sample control over Type I error probabilities under a variety of assumptions on the measurement errors in the observed data. Further, we demonstrate how the conformal prediction method could be used for constructing prediction intervals for unobserved exoplanet masses using established broken power-law relationships between masses and radii found in the literature.
翻译:天文学家经常处理协变量和因变量均存在异方差非高斯测量误差的数据。例如,尽管TESS和Kepler数据集提供了丰富的信息,但解决测量误差和系统偏差的挑战对于提取可靠的科学见解以及提升机器学习模型性能至关重要。虽然目前已开发出针对此类数据的回归参数估计技术,但能够构建具有有限样本覆盖保证的预测区间的技术仍较为缺乏。为解决这一问题,我们将保形预测方法适配于本应用场景。我们通过实证表明,在观测数据测量误差的多种假设条件下,该方法能够对第一类错误概率实现有限样本控制。此外,我们还展示了如何利用文献中已建立的质量与半径间的分段幂律关系,运用保形预测方法为未观测到的系外行星质量构建预测区间。