Statistical analysis is increasingly confronted with complex data from metric spaces. Petersen and M\"uller (2019) established a general paradigm of Fr\'echet regression with complex metric space valued responses and Euclidean predictors. However, the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forest weighted local Fr\'echet regression paradigm. The main mechanism of our approach relies on a locally adaptive kernel generated by random forests. Our first method utilizes these weights as the local average to solve the conditional Fr\'echet mean, while the second method performs local linear Fr\'echet regression, both significantly improving existing Fr\'echet regression methods. Based on the theory of infinite order U-processes and infinite order Mmn -estimator, we establish the consistency, rate of convergence, and asymptotic normality for our local constant estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our methods with several commonly encountered types of responses such as distribution functions, symmetric positive-definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to human mortality distribution data.
翻译:统计分析日益面临来自度量空间的复杂数据。Petersen与Müller(2019)建立了以度量空间值响应变量及欧几里得预测变量为特征的Fréchet回归通用范式。然而,其中的局部方法涉及非参数核平滑,并受维度灾难影响。为解决此问题,本文提出一种新型随机森林加权局部Fréchet回归范式。该方法的核心机制依赖于随机森林生成的局部自适应核。第一种方法利用这些权重作为局部均值求解条件Fréchet均值,第二种方法则执行局部线性Fréchet回归,两者均显著提升了现有Fréchet回归方法的性能。基于无限阶U-过程与无限阶Mmn-估计量理论,我们建立了局部常数估计量的一致性、收敛速率及渐近正态性,该理论将现有针对欧几里得响应的随机森林大样本理论作为特例。数值研究表明,对于分布函数、对称正定矩阵及球面数据等常见响应类型,本方法均具有优越性。通过人类死亡率分布数据的实际应用,进一步验证了所提方案的实践价值。