We characterize information as risk reduction between knowledge states represented by partitions of the underlying probability space. Entropy corresponds to risk reduction from no (or partial) knowledge to full knowledge about a random variable, while information corresponds to risk reduction from no (or partial) knowledge to partial knowledge. This applies to any information measure that is based on expected loss minimization, such as Bregman information, with Shannon information and variance as prominent examples. In each case, fundamental properties like the chain rule, non-negativity, and the relationship between information and divergence are preserved. Because partitions form a lattice under refinement, our general treatment reveals how information can be decomposed into redundant, unique, and synergistic contributions, a question important in applications from neuroscience to machine learning, yet one for which existing formulations lack consensus on foundational definitions and can violate basic properties such as the chain rule or non-negativity. Redundancy corresponds to Aumann's common knowledge, synergy to the gap between separately and jointly observed sources, and unique information is necessarily path-dependent, taking different values depending on what is already known. The resulting partial information decomposition is grounded directly in probability theory, avoids treating scalar information quantities as primitive compositional objects, yields non-negative terms by construction, and offers a more fine-grained credit assignment.
翻译:我们将信息刻画为底层概率空间划分所表示的知识状态之间的风险降低。熵对应于从对随机变量无(或部分)知识到完全知识的风险降低,而信息则对应于从无(或部分)知识到部分知识的风险降低。这适用于任何基于期望损失最小化的信息度量,例如Bregman信息,其中香农信息与方差是突出的例子。在每种情况下,诸如链式法则、非负性以及信息与散度之间的关系等基本性质都得以保持。由于划分在细化关系下构成一个格,我们的通用处理揭示了信息如何能够分解为冗余、独特和协同的贡献,这是一个从神经科学到机器学习应用中都很重要的问题,然而现有的公式化方法在基础定义上缺乏共识,并且可能违反链式法则或非负性等基本性质。冗余对应于奥曼的共同知识,协同对应于分别观测与联合观测源之间的差距,而独特信息必然是路径依赖的,其取值取决于已知内容。由此产生的部分信息分解直接根植于概率论,避免了将标量信息量视为原始的组成对象,通过构造产生非负项,并提供了一种更细粒度的信用分配。