Boosting has emerged as a useful machine learning technique over the past three decades, attracting increased attention. Most advancements in this area, however, have primarily focused on numerical implementation procedures, often lacking rigorous theoretical justifications. Moreover, these approaches are generally designed for datasets with fully observed data, and their validity can be compromised by the presence of missing observations. In this paper, we employ semiparametric estimation approaches to develop boosting prediction methods for data with missing responses. We explore two strategies for adjusting the loss functions to account for missingness effects. The proposed methods are implemented using a functional gradient descent algorithm, and their theoretical properties, including algorithm convergence and estimator consistency, are rigorously established. Numerical studies demonstrate that the proposed methods perform well in finite sample settings.
翻译:提升(Boosting)作为一种有效的机器学习技术,在过去三十年间逐渐兴起并受到日益广泛的关注。然而,该领域的大多数进展主要集中于数值实现过程,往往缺乏严格的理论论证。此外,这些方法通常针对完全观测的数据集设计,当存在缺失观测时,其有效性可能会受到影响。本文采用半参数估计方法,针对响应变量存在缺失的数据开发提升预测方法。我们探讨了两种调整损失函数以应对缺失效应的策略。所提出的方法通过函数梯度下降算法实现,并严格建立了包括算法收敛性与估计量一致性在内的理论性质。数值研究表明,所提方法在有限样本条件下表现良好。