Improving embedding of graphs with missing data by soft manifolds

Embedding graphs in continous spaces is a key factor in designing and developing algorithms for automatic information extraction to be applied in diverse tasks (e.g., learning, inferring, predicting). The reliability of graph embeddings directly depends on how much the geometry of the continuous space matches the graph structure. Manifolds are mathematical structure that can enable to incorporate in their topological spaces the graph characteristics, and in particular nodes distances. State-of-the-art of manifold-based graph embedding algorithms take advantage of the assumption that the projection on a tangential space of each point in the manifold (corresponding to a node in the graph) would locally resemble a Euclidean space. Although this condition helps in achieving efficient analytical solutions to the embedding problem, it does not represent an adequate set-up to work with modern real life graphs, that are characterized by weighted connections across nodes often computed over sparse datasets with missing records. In this work, we introduce a new class of manifold, named soft manifold, that can solve this situation. In particular, soft manifolds are mathematical structures with spherical symmetry where the tangent spaces to each point are hypocycloids whose shape is defined according to the velocity of information propagation across the data points. Using soft manifolds for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets. Experimental results on reconstruction tasks on synthetic and real datasets show how the proposed approach enable more accurate and reliable characterization of graphs in continuous spaces with respect to the state-of-the-art.

翻译：将图嵌入连续空间是设计开发面向多样任务（如学习、推理、预测）的自动信息提取算法时的关键因素。图嵌入的可靠性直接取决于连续空间几何结构与图结构的匹配程度。流形作为一种数学结构，能够在其拓扑空间中融合图的特征，特别是节点间的距离。当前基于流形的图嵌入算法利用了一个假设：流形上每个点（对应图中的节点）在切空间上的投影局部近似于欧几里得空间。尽管该假设有助于获得嵌入问题的高效解析解，但它并不适用于处理现代真实图——这类图通常具有跨节点的加权连接，且常基于存在缺失记录的稀疏数据集计算得出。本文提出一类名为软流形的新型流形结构来解决该问题。具体而言，软流形是具有球对称性的数学结构，其上各点的切空间为内摆线，其形状根据数据点间信息传播速度来定义。通过将软流形应用于图嵌入，我们能够为复杂数据集上的各类数据分析任务提供连续空间支持。在合成数据集与真实数据集上的重建任务实验结果表明，相较于现有方法，所提方法能够更准确、更可靠地实现连续空间中的图表征。