When empirical objects are represented as discrete probability distributions, within-distribution summaries such as Shannon entropy and Hill-type diversity indices describe how probability mass is spread inside each object, while Kullback-Leibler (KL) divergence provides pairwise asymmetric information. This note focuses on the KL difference $Δ_{\mathrm{KL}}(p,q)=D_{\mathrm{KL}}(p|q)-D_{\mathrm{KL}}(q|p)$. Although $Δ_{\mathrm{KL}}$ can add information beyond within-distribution summaries and symmetric overlap, its sign does not, by itself, establish support inclusion, coverage, or breadth. It is better understood as a weighted category-wise log-ratio contrast reflecting asymmetric probability-mass placement. The point becomes clear once the definition is written out. The aim of this note is therefore to present it in a compact, example-based form, together with a descriptive bibliometric illustration based on COVID-19-related preprint-server topic distributions.
翻译:当经验对象以离散概率分布表示时,诸如香农熵和希尔型多样性指数等分布内汇总指标描述了每个对象内部概率质量的分布情况,而Kullback-Leibler散度则提供了成对的不对称信息。本文聚焦于KL差异$Δ_{\mathrm{KL}}(p,q)=D_{\mathrm{KL}}(p|q)-D_{\mathrm{KL}}(q|p)$。尽管$Δ_{\mathrm{KL}}$能在分布内汇总指标和对称重叠信息之外提供额外信息,但其符号本身并不能确定支持包含关系、覆盖范围或广度。更恰当的理解是,它反映了一种基于加权的类别间对数比对比,体现了不对称的概率质量放置。一旦写出其定义,这一点便显而易见。因此,本文旨在以简洁且基于示例的形式呈现该差异,并辅以一个基于COVID-19相关预印本服务器主题分布的描述性文献计量学示例。