Bump hunting deals with finding in sample spaces meaningful data subsets known as bumps. These have traditionally been conceived as modal or concave regions in the graph of the underlying density function. We define an abstract bump construct based on curvature functionals of the probability density. Then, we explore several alternative characterizations involving derivatives up to second order. In particular, a suitable implementation of Good and Gaskins' original concave bumps is proposed in the multivariate case. Moreover, we bring to exploratory data analysis concepts like the mean curvature and the Laplacian that have produced good results in applied domains. Our methodology addresses the approximation of the curvature functional with a plug-in kernel density estimator. We provide theoretical results that assure the asymptotic consistency of bump boundaries in the Hausdorff distance with affordable convergence rates. We also present asymptotically valid and consistent confidence regions bounding curvature bumps. The theory is illustrated through several use cases in sports analytics with datasets from the NBA, MLB and NFL. We conclude that the different curvature instances effectively combine to generate insightful visualizations.
翻译:隆起搜寻旨在发现样本空间中具有意义的数据子集,即所谓的隆起。传统上,这些隆起被理解为底层密度函数图形中的模态区域或凹区域。本文基于概率密度的曲率泛函定义了一个抽象的隆起构造,进而探索了涉及二阶导数的多种替代特征描述。特别地,针对多元情形,我们提出了Good和Gaskins原始凹隆起的适当实现方法。此外,我们将平均曲率和拉普拉斯算子等概念引入探索性数据分析,这些概念已在应用领域取得了良好效果。我们的方法采用插入式核密度估计器来逼近曲率泛函。我们提供了理论结果,确保隆起边界在豪斯多夫距离下具有渐近一致性,且收敛速率可接受。同时,我们构建了渐近有效且一致的置信区域,用于界定曲率隆起。通过来自NBA、MLB和NFL数据集的多项体育分析案例,我们展示了该理论的应用效果。结论表明,不同曲率实例的有效结合能够生成富有洞见的可视化结果。