Traditional measures based solely on pairwise associations often fail to capture the complex statistical structure of multivariate data. Existing approaches for identifying information shared among $d>3$ variables are frequently computationally intractable, asymmetric with respect to a target variable, or unable to account for all the ways in which the joint probability distribution can be factorised. Here we present a systematic framework based on lattice theory to derive higher-order information-theoretic measures for multivariate data. Our construction uses lattice and operator function pairs, whereby an operator function is applied over a lattice that represents the algebraic relationships among variables. We show that many commonly used measures can be derived within this framework, yet they fail to capture all interactions for $d>3$, either because they are defined on restricted sublattices, or because the use of the KL divergence as an operator function, a typical choice, leads to undesired disregard of groups of interactions. To fully characterise all interactions among $d$ variables, we introduce the Streitberg Information, which is defined over the full partition lattice and uses generalised divergences (beyond KL) as operator functions. We validate the Streitberg Information on synthetic data, and illustrate its application in detecting complex interactions among stocks, decoding neural signals, and performing feature selection in machine learning.
翻译:传统仅基于成对关联的测度往往无法捕捉多元数据中复杂的统计结构。现有识别$d>3$个变量间共享信息的方法通常存在计算不可行、相对于目标变量不对称,或无法涵盖联合概率分布所有可能分解方式等问题。本文提出基于格理论的系统框架,用于推导多元数据的高阶信息论测度。我们的构建方法采用格与算子函数对,其中算子函数作用于表示变量间代数关系的格结构。研究表明,许多常用测度均可在此框架下导出,但它们仍无法完整捕捉$d>3$时的所有交互作用——这既源于其定义在受限子格上,也因通常选择KL散度作为算子函数会导致特定交互群被不当忽略。为全面刻画$d$个变量间的所有交互,我们引入Streitberg信息,该测度定义在完整划分格上,并采用超越KL散度的广义散度作为算子函数。我们在合成数据上验证了Streitberg信息的有效性,并展示了其在股票复杂交互检测、神经信号解码以及机器学习特征选择中的应用。