Traditional models based solely on pairwise associations often fail to capture the complex statistical structure of multivariate data. Existing approaches for identifying information shared among groups of $d>3$ variables are frequently computationally intractable, asymmetric with respect to a target variable, or unable to account for all factorisations of the joint probability distribution. We present a systematic framework that derives higher-order information-theoretic measures using lattice and operator function pairs, whereby the lattice representing the algebraic relationships among variables, with operator functions that compute the measures over the lattice. We show that many commonly used measures can be derived within this framework, however they are often restricted to sublattices of the partition lattice, which prevents them from capturing all interactions when $d>3$. We also demonstrate that KL divergence, when used as an operator function, leads to unwanted cancellation of interactions for $d>3$. To fully characterise all interactions among $d$ variables, we introduce the Streitberg Information, using generalisations of KL divergence as an operator function, and defined over the full partition lattice. We validate Streitberg Information numerically on synthetic data, and illustrate its application in analysing complex interactions among stocks, decoding neural signals, and performing feature selection in machine learning.
翻译:仅基于成对关联的传统模型往往无法捕捉多元数据的复杂统计结构。现有识别$d>3$个变量间共享信息的方法通常存在计算不可行、对目标变量不对称或无法涵盖联合概率分布所有分解形式等问题。本文提出一个系统框架,通过格结构与算子函数对推导高阶信息论测度:其中格结构表示变量间的代数关系,算子函数则用于计算格上的测度。我们证明许多常用测度均可在此框架下导出,但它们通常局限于划分格的子格结构,导致$d>3$时无法捕捉所有交互作用。研究还表明,当采用KL散度作为算子函数时,会导致$d>3$情况下出现不期望的交互抵消现象。为完整刻画$d$个变量间的所有交互,我们引入Streitberg信息测度——该测度以KL散度的广义形式作为算子函数,并在完整划分格上定义。我们通过合成数据对Streitberg信息进行数值验证,并展示其在股票复杂交互分析、神经信号解码以及机器学习特征选择等领域的应用。