Statistical analysis is increasingly confronted with complex data from metric spaces. Petersen and M\"uller (2019) established a general paradigm of Fr\'echet regression with complex metric space valued responses and Euclidean predictors. However, the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality. To address this issue, we in this paper propose a novel random forest weighted local Fr\'echet regression paradigm. The main mechanism of our approach relies on a locally adaptive kernel generated by random forests. Our first method utilizes these weights as the local average to solve the conditional Fr\'echet mean, while the second method performs local linear Fr\'echet regression, both significantly improving existing Fr\'echet regression methods. Based on the theory of infinite order U-processes and infinite order Mmn -estimator, we establish the consistency, rate of convergence, and asymptotic normality for our local constant estimator, which covers the current large sample theory of random forests with Euclidean responses as a special case. Numerical studies show the superiority of our methods with several commonly encountered types of responses such as distribution functions, symmetric positive-definite matrices, and sphere data. The practical merits of our proposals are also demonstrated through the application to human mortality distribution data and New York taxi data.
翻译:统计分析日益面临来自度量空间的复杂数据。Petersen和Müller(2019)建立了针对度量空间取值响应和欧几里得预测变量的Fr´echet回归一般范式。然而,其中的局部方法涉及非参数核平滑,并受维数灾难的影响。为解决该问题,本文提出一种新颖的随机森林加权局部Fr´echet回归范式。该方法的主要机制依赖于随机森林生成的局部自适应核。第一种方法利用这些权重作为局部平均,求解条件Fr´echet均值;第二种方法执行局部线性Fr´echet回归,两者均显著改进了现有Fr´echet回归方法。基于无限阶U过程和无限阶Mmn估计量理论,我们建立了局部常数估计量的一致性、收敛速率和渐近正态性,该理论涵盖当前以欧几里得响应为特例的随机森林大样本理论。数值研究表明,对于分布函数、对称正定矩阵和球面数据等几种常见响应类型,我们的方法具有优越性。通过人类死亡率分布数据和纽约出租车数据的应用,进一步展示了所提方法的实用价值。