Probability-turbulence divergence: A tunable allotaxonometric instrument for comparing heavy-tailed categorical distributions

Real-world complex systems often comprise many distinct types of elements as well as many more types of networked interactions between elements. When the relative abundances of types can be measured well, we further observe heavy-tailed categorical distributions for type frequencies. For the comparison of type frequency distributions of two systems or a system with itself at different time points in time -- a facet of allotaxonometry -- a great range of probability divergences are available. Here, we introduce and explore `probability-turbulence divergence', a tunable, straightforward, and interpretable instrument for comparing normalizable categorical frequency distributions. We model probability-turbulence divergence (PTD) after rank-turbulence divergence (RTD). While probability-turbulence divergence is more limited in application than rank-turbulence divergence, it is more sensitive to changes in type frequency. We build allotaxonographs to display probability turbulence, incorporating a way to visually accommodate zero probabilities for `exclusive types' which are types that appear in only one system. We explore comparisons of example distributions taken from literature, social media, and ecology. We show how probability-turbulence divergence either explicitly or functionally generalizes many existing kinds of distances and measures, including, as special cases, $L^{(p)}$ norms, the S{\o}rensen-Dice coefficient (the $F_1$ statistic), and the Hellinger distance. We discuss similarities with the generalized entropies of R{\'e}nyi and Tsallis, and the diversity indices (or Hill numbers) from ecology. We close with thoughts on open problems concerning the optimization of the tuning of rank- and probability-turbulence divergence.

翻译：现实世界的复杂系统通常包含许多不同类型的元素，以及更多类型的元素间网络化交互。当能够很好地测量类型的相对丰度时，我们进一步观察到类型频率呈现重尾类别分布。为了比较两个系统或同一系统在不同时间点的类型频率分布——这是异量测度学的一个方面——存在大量概率散度可供选择。在此，我们引入并探讨“概率湍流散度”，这是一种用于比较可归一化类别频率分布的可调、直观且易于解释的工具。我们仿照秩湍流散度构建了概率湍流散度模型。虽然概率湍流散度的应用范围比秩湍流散度更有限，但它对类型频率的变化更为敏感。我们构建了异量图示来展示概率湍流，其中包含一种可视化处理“独占类型”（即仅出现在一个系统中的类型）零概率的方法。我们探讨了从文献、社交媒体和生态学中选取的示例分布的比较。我们展示了概率湍流散度如何显式或函数式地推广了许多现有距离和度量，包括作为特例的$L^{(p)}$范数、Sørensen-Dice系数（$F_1$统计量）和Hellinger距离。我们讨论了其与Rényi和Tsallis广义熵以及生态学中的多样性指数（或Hill数）的相似性。最后，我们提出了关于秩湍流散度与概率湍流散度调优优化的开放性问题。