Among the variety of statistical intervals, highest-density regions (HDRs) stand out for their ability to effectively summarize a distribution or sample, unveiling its distinctive and salient features. An HDR represents the minimum size set that satisfies a certain probability coverage, and current methods for their computation require knowledge or estimation of the underlying probability distribution or density $f$. In this work, we illustrate a broader framework for computing HDRs, which generalizes the classical density quantile method introduced in the seminal paper of Hyndman (1996). The framework is based on neighbourhood measures, i.e., measures that preserve the order induced in the sample by $f$, and include the density $f$ as a special case. We explore a number of suitable distance-based measures, such as the $k$-nearest neighborhood distance, and some probabilistic variants based on copula models. An extensive comparison is provided, showing the advantages of the copula-based strategy, especially in those scenarios that exhibit complex structures (e.g., multimodalities or particular dependencies). Finally, we discuss the practical implications of our findings for estimating HDRs in real-world applications.
翻译:在众多统计区间中,最高密度区域因其能有效概括分布或样本、揭示其独特显著特征而备受关注。最高密度区域指满足特定概率覆盖的最小规模集合,现有计算方法需已知或估计基础概率分布或密度函数$f$。本文提出一个更广义的最高密度区域计算框架,该框架推广了Hyndman(1996)开创性论文中提出的经典密度分位数方法。该框架基于邻域度量——即保持样本中由$f$诱导的序关系的度量,并将密度$f$作为特例纳入其中。我们探讨了若干适用的基于距离的度量,如$k$近邻距离,以及基于copula模型的概率变体。通过广泛比较,揭示了基于copula策略的优势,尤其在呈现复杂结构(如多峰性或特定依赖性)的场景中。最后,我们讨论了本研究结果在实际应用中估计最高密度区域的实践意义。