Regression with random data objects is becoming increasingly common in modern data analysis. Unfortunately, like the traditional regression setting with Euclidean data, random response regression is not immune to the trouble caused by unusual observations. A metric Cook's distance extending the classical Cook's distances of Cook (1977) to general metric-valued response objects is proposed. The performance of the metric Cook's distance in both Euclidean and non-Euclidean response regression with Euclidean predictors is demonstrated in an extensive experimental study. A real data analysis of county-level COVID-19 transmission in the United States also illustrates the usefulness of this method in practice.
翻译:在现代数据分析中,基于随机数据对象的回归分析日益普遍。然而,与基于欧几里得数据的传统回归设定类似,随机响应回归也难以避免异常观测带来的问题。本文提出了一种度量库克距离,将库克(1977)的经典库克距离推广至一般度量值响应对象。通过广泛的实验研究,验证了该度量库克距离在基于欧几里得预测变量的欧几里得与非欧几里得响应回归中的性能。对美国县级COVID-19传播的真实数据分析进一步说明了该方法在实际应用中的实用性。