Efficient Algorithms for Attributed Graph Alignment with Vanishing Edge Correlation

Graph alignment refers to the task of finding the vertex correspondence between two positively correlated graphs. Extensive study has been done on polynomial-time algorithms for the graph alignment problem under the Erd\H{o}s--R\'enyi graph pair model, where the two graphs are Erd\H{o}s--R\'enyi graphs with edge probability $q_\mathrm{u}$, correlated under certain vertex correspondence. To achieve exact recovery of the vertex correspondence, all existing algorithms at least require the edge correlation coefficient $\rho_\mathrm{u}$ between the two graphs to satisfy $\rho_\mathrm{u} > \sqrt{\alpha}$, where $\alpha \approx 0.338$ is Otter's tree-counting constant. Moreover, it is conjectured in [1] that no polynomial-time algorithm can achieve exact recovery under weak edge correlation $\rho_\mathrm{u}<\sqrt{\alpha}$. In this paper, we show that with a vanishing amount of additional attribute information, exact recovery is polynomial-time feasible under vanishing edge correlation $\rho_\mathrm{u} \ge n^{-\Theta(1)}$. We identify a local tree structure, which incorporates one layer of user information and one layer of attribute information, and apply the subgraph counting technique to such structures. A polynomial-time algorithm is proposed that recovers the vertex correspondence for all but a vanishing fraction of vertices. We then further refine the algorithm output to achieve exact recovery. The motivation for considering additional attribute information comes from the widely available side information in real-world applications, such as the user's birthplace and educational background on LinkedIn and Twitter social networks.

翻译：图对齐是指寻找两个正相关图之间顶点对应关系的任务。在Erdős–Rényi图对模型下，图对齐问题的多项式时间算法已得到广泛研究。在该模型中，两个图均为具有边概率$q_\mathrm{u}$的Erdős–Rényi图，并在特定顶点对应关系下存在相关性。为实现顶点对应关系的精确恢复，现有所有算法至少要求两个图之间的边相关系数$\rho_\mathrm{u}$满足$\rho_\mathrm{u} > \sqrt{\alpha}$，其中$\alpha \approx 0.338$为Otter树计数常数。此外，文献[1]推测，在弱边相关性$\rho_\mathrm{u}<\sqrt{\alpha}$条件下，不存在多项式时间算法能实现精确恢复。本文证明，在添加少量属性信息的情况下，即使边相关性趋于消失$\rho_\mathrm{u} \ge n^{-\Theta(1)}$，精确恢复仍可在多项式时间内实现。我们识别出一种局部树结构，该结构包含一层用户信息和一层属性信息，并将子图计数技术应用于此类结构。本文提出一种多项式时间算法，可恢复除消失比例顶点外的所有顶点对应关系，并进一步优化算法输出以实现精确恢复。考虑附加属性信息的动机源于现实应用（如LinkedIn和Twitter社交网络中的用户出生地、教育背景等）中广泛存在的辅助信息。