We characterize information as risk reduction between knowledge states represented by partitions of the underlying probability space. Entropy corresponds to risk reduction from no (or partial) knowledge to full knowledge about a random variable, while information corresponds to risk reduction from no (or partial) knowledge to partial knowledge. This applies to any information measure that is based on expected loss minimization, such as Bregman information, with Shannon information and variance as prominent examples. In each case, fundamental properties like the chain rule, non-negativity, and the relationship between information and divergence are preserved. Because partitions form a lattice under refinement, our general treatment reveals how information can be decomposed into redundant, unique, and synergistic contributions, a question important in applications from neuroscience to machine learning, yet one for which existing formulations lack consensus on foundational definitions and can violate basic properties such as the chain rule or non-negativity. Redundancy corresponds to Aumann's common knowledge, synergy to the gap between separately and jointly observed sources, and unique information is necessarily path-dependent, taking different values depending on what is already known. The resulting partial information decomposition is grounded directly in probability theory, avoids treating scalar information quantities as primitive compositional objects, and yields non-negative terms by construction.
翻译:我们将信息表征为底层概率空间划分所表示的知识状态之间的风险降低。熵对应于从对随机变量无(或部分)知识到完全知识的风险降低,而信息则对应于从无(或部分)知识到部分知识的风险降低。这适用于任何基于期望损失最小化的信息度量,例如Bregman信息,其中香农信息与方差是突出实例。在每种情况下,链式法则、非负性以及信息与散度关系等基本性质均得以保持。由于划分在细化关系下构成格结构,我们的通用处理揭示了信息如何能被分解为冗余、独特与协同贡献,这一从神经科学到机器学习的应用中至关重要的问题,其现有表述在基础定义上缺乏共识,且可能违反链式法则或非负性等基本性质。冗余对应奥曼的公共知识,协同对应分别观测与联合观测源之间的差距,而独特信息必然具有路径依赖性,其取值取决于已有知识。由此得到的部分信息分解直接植根于概率论,避免将标量信息量视为原始组合对象,并通过构造方式产生非负项。