The comparison of frequency distributions is a common statistical task with broad applications and a long history of methodological development. However, existing measures do not quantify the magnitude and direction by which one distribution is shifted relative to another. In the present study, we define distributional shift (DS) as the concentration of frequencies away from the greatest discrete class, e.g., a histogram's right-most bin. We derive a measure of DS based on the sum of cumulative frequencies, intuitively quantifying shift as a statistical moment. We then define relative distributional shift (RDS) as the difference in DS between distributions. Using simulated random sampling, we demonstrate that RDS is highly related to measures that are popularly used to compare frequency distributions. Focusing on a specific use case, i.e., simulated healthcare Evaluation and Management coding profiles, we show how RDS can be used to examine many pairs of empirical and expected distributions via shift-significance plots. In comparison to other measures, RDS has the unique advantage of being a signed (directional) measure based on a simple difference in an intuitive property.
翻译:频率分布的比较是一项常见的统计任务,具有广泛的应用和悠久的 methodological 发展历史。然而,现有度量方法无法量化一个分布相对于另一个分布的偏移幅度和方向。在本研究中,我们将分布偏移(DS)定义为频率远离最大离散类别的集中程度,例如直方图的最右侧柱。我们基于累积频率之和推导出 DS 的度量,直观地将偏移量化为统计矩。随后,我们将相对分布偏移(RDS)定义为不同分布之间 DS 的差异。通过模拟随机抽样,我们证明 RDS 与常用于比较频率分布的度量高度相关。针对特定应用场景(即模拟的医疗评估与管理编码档案),我们展示了如何利用 RDS 通过偏移显著性图来检验多对经验分布与期望分布。与其他度量方法相比,RDS 具有独特优势,即它是一种基于直观属性的简单差值的带符号(有方向性)度量。