Obtaining sparse, interpretable representations of observable data is crucial in many machine learning and signal processing tasks. For data representing flows along the edges of a graph, an intuitively interpretable way to obtain such representations is to lift the graph structure to a simplicial complex: The eigenvectors of the associated Hodge-Laplacian, respectively the incidence matrices of the corresponding simplicial complex then induce a Hodge decomposition, which can be used to represent the observed data in terms of gradient, curl, and harmonic flows. In this paper, we generalize this approach to cellular complexes and introduce the cell inference optimization problem, i.e., the problem of augmenting the observed graph by a set of cells, such that the eigenvectors of the associated Hodge Laplacian provide a sparse, interpretable representation of the observed edge flows on the graph. We show that this problem is NP-hard and introduce an efficient approximation algorithm for its solution. Experiments on real-world and synthetic data demonstrate that our algorithm outperforms current state-of-the-art methods while being computationally efficient.
翻译:在许多机器学习和信号处理任务中,获得可观测数据的稀疏可解释表示至关重要。对于表示图中边沿流动的数据,一种直观可解释的获取此类表示的方法是将图结构提升为单纯复形:相关的Hodge-Laplacian特征向量(即相应单纯复形的关联矩阵)会诱导出Hodge分解,从而可用梯度流、旋度流和谐波流来表示观测数据。本文将此方法推广至胞腔复形,引入胞腔推理优化问题——即通过添加一组胞腔来增强观测图结构,使得相关Hodge-Laplacian特征向量能为图上的观测边流提供稀疏可解释的表示。我们证明该问题是NP难的,并为其求解提出了一种高效近似算法。在真实和合成数据上的实验表明,我们的算法在保持计算高效性的同时,优于当前最先进的方法。