Regression with random data objects is becoming increasingly common in modern data analysis. Unfortunately, this novel regression method is not immune to the trouble caused by unusual observations. A metric Cook's distance extending the original Cook's distances of Cook (1977) to regression between metric-valued response objects and Euclidean predictors is proposed. The performance of the metric Cook's distance is demonstrated in regression across four different response spaces in an extensive experimental study. Two real data applications involving the analyses of distributions of COVID-19 transmission in the State of Texas and the analyses of the structural brain connectivity networks are provided to illustrate the utility of the proposed method in practice.
翻译:随机数据对象的回归在现代数据分析中日益普遍。遗憾的是,这种新型回归方法同样无法避免异常观测点带来的干扰。本文提出了一种度量Cook距离,将Cook(1977)提出的原始Cook距离推广到度量值响应对象与欧几里得预测变量之间的回归问题。通过大量实验研究,在四个不同的响应空间回归任务中验证了度量Cook距离的性能。文中提供了两个实际数据应用案例:一是分析德克萨斯州COVID-19传播分布,二是分析大脑结构连接网络,以说明所提方法在实际应用中的效用。