Models that rely solely on pairwise relationships often fail to capture the complete statistical structure of the complex multivariate data found in diverse domains, such as socio-economic, ecological, or biomedical systems. Non-trivial dependencies between groups of more than two variables can play a significant role in the analysis and modelling of such systems, yet extracting such high-order interactions from data remains challenging. Here, we introduce a hierarchy of $d$-order ($d \geq 2$) interaction measures, increasingly inclusive of possible factorisations of the joint probability distribution, and define non-parametric, kernel-based tests to establish systematically the statistical significance of $d$-order interactions. We also establish mathematical links with lattice theory, which elucidate the derivation of the interaction measures and their composite permutation tests; clarify the connection of simplicial complexes with kernel matrix centring; and provide a means to enhance computational efficiency. We illustrate our results numerically with validations on synthetic data, and through an application to neuroimaging data.
翻译:仅依赖成对关系的模型往往无法捕捉复杂多变量数据(如社会经济、生态或生物医学系统)中的完整统计结构。多于两个变量的组间非平凡依赖性在分析此类系统时可能发挥重要作用,但从数据中提取这类高阶交互仍具挑战性。本文提出一组d阶(d≥2)交互度量层级,逐步涵盖联合概率分布的可能因子分解形式,并定义基于核的非参数检验方法,以系统性地建立d阶交互的统计显著性。我们建立了与格论的数学关联,阐明交互度量及其复合置换检验的推导过程;揭示了单纯复形与核矩阵中心化的联系;并提供提升计算效率的途径。通过合成数据的数值验证及神经影像数据应用,我们展示了研究结果。