Most scientific challenges can be framed into one of the following three levels of complexity of function approximation. Type 1: Approximate an unknown function given input/output data. Type 2: Consider a collection of variables and functions, some of which are unknown, indexed by the nodes and hyperedges of a hypergraph (a generalized graph where edges can connect more than two vertices). Given partial observations of the variables of the hypergraph (satisfying the functional dependencies imposed by its structure), approximate all the unobserved variables and unknown functions. Type 3: Expanding on Type 2, if the hypergraph structure itself is unknown, use partial observations of the variables of the hypergraph to discover its structure and approximate its unknown functions. While most Computational Science and Engineering and Scientific Machine Learning challenges can be framed as Type 1 and Type 2 problems, many scientific problems can only be categorized as Type 3. Despite their prevalence, these Type 3 challenges have been largely overlooked due to their inherent complexity. Although Gaussian Process (GP) methods are sometimes perceived as well-founded but old technology limited to Type 1 curve fitting, their scope has recently been expanded to Type 2 problems. In this paper, we introduce an interpretable GP framework for Type 3 problems, targeting the data-driven discovery and completion of computational hypergraphs. Our approach is based on a kernel generalization of Row Echelon Form reduction from linear systems to nonlinear ones and variance-based analysis. Here, variables are linked via GPs and those contributing to the highest data variance unveil the hypergraph's structure. We illustrate the scope and efficiency of the proposed approach with applications to (algebraic) equation discovery, network discovery (gene pathways, chemical, and mechanical) and raw data analysis.
翻译:大多数科学挑战可归结为函数逼近复杂度的以下三个层级。类型1:根据输入/输出数据逼近未知函数。类型2:考虑由超图(一种边可连接两个以上顶点的广义图)的节点和超边索引的变量与函数集合,其中部分函数未知。在给定超图变量部分观测值(满足其结构施加的函数依赖关系)的情况下,逼近所有未观测变量和未知函数。类型3:在类型2基础上,若超图结构本身未知,则利用超图变量的部分观测值发现其结构并逼近未知函数。虽然大多数计算科学与工程及科学机器学习挑战可归类为类型1和类型2问题,但许多科学问题仅能归入类型3。尽管此类类型3挑战普遍存在,但由于其固有复杂性,长期未受足够重视。尽管高斯过程方法常被视为基础但过时的技术而局限于类型1曲线拟合,其适用范围近期已扩展至类型2问题。本文针对类型3问题提出可解释的高斯过程框架,旨在实现计算超图的数据驱动发现与补全。该方法基于从线性系统到非线性系统的行阶梯形约简的核泛化及方差分析,其中变量通过高斯过程关联,贡献最大数据方差的变量将揭示超图结构。我们通过(代数)方程发现、网络发现(基因通路、化学与机械网络)及原始数据分析等应用,展示了所提方法的适用范围与效率。