Regression with random data objects is becoming increasingly common in modern data analysis. Unfortunately, like the traditional regression setting with Euclidean data, random response regression is not immune to the trouble caused by unusual observations. A metric Cook's distance extending the classical Cook's distances of Cook (1977) to general metric-valued response objects is proposed. The performance of the metric Cook's distance in both Euclidean and non-Euclidean response regression with Euclidean predictors is demonstrated in an extensive experimental study. A real data analysis of county-level COVID-19 transmission in the United States also illustrates the usefulness of this method in practice.
翻译:具有随机数据对象的回归在现代数据分析中日益普遍。不幸的是,与采用欧几里得数据的传统回归设定类似,随机响应回归也无法避免异常观测值所引发的问题。本文提出了一种度量库克距离,将库克(1977)的经典库克距离推广至一般的度量值响应对象。通过一项广泛的实验研究,我们验证了该度量库克距离在具有欧几里得预测变量的欧几里得与非欧几里得响应回归中的性能。对美国县级COVID-19传播的真实数据分析也展示了该方法在实践中的实用性。