Common statistical measures of uncertainty such as $p$-values and confidence intervals quantify the uncertainty due to sampling, that is, the uncertainty due to not observing the full population. However, sampling is not the only source of uncertainty. In practice, distributions change between locations and across time. This makes it difficult to gather knowledge that transfers across data sets. We propose a measure of instability that quantifies the distributional instability of a statistical parameter with respect to Kullback-Leibler divergence, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Leibler divergence ball. In addition, we quantify the instability of parameters with respect to directional or variable-specific shifts. Measuring instability with respect to directional shifts can be used to detect the type of shifts a parameter is sensitive to. We discuss how such knowledge can inform data collection for improved estimation of statistical parameters under shifted distributions. We evaluate the performance of the proposed measure on real data and show that it can elucidate the distributional instability of a parameter with respect to certain shifts and can be used to improve estimation accuracy under shifted distributions.
翻译:常见的统计不确定性度量方法(如$p$值和置信区间)主要量化由抽样引起的不确定性,即因未观测到完整总体而产生的不确定性。然而,抽样并非不确定性的唯一来源。在实践中,分布会随地域和时间发生变化,这使得积累能够跨数据集迁移的知识变得困难。我们提出了一种不稳定性度量方法,用于量化统计参数相对于Kullback-Leibler散度的分布不稳定性,即参数在Kullback-Leibler散度球内受一般分布扰动影响的敏感度。此外,我们还量化了参数在方向性或变量特定偏移下的不稳定性。通过测量方向性偏移下的不稳定性,可以检测参数对何种类型的偏移敏感。我们讨论了如何利用此类知识指导数据收集,以改进偏移分布下统计参数的估计。我们在真实数据上评估了所提出度量的性能,结果表明该方法能够阐明参数对特定偏移的分布不稳定性,并可用于提升偏移分布下的估计精度。