Existing causal models for link prediction assume an underlying set of inherent node factors -- an innate characteristic defined at the node's birth -- that governs the causal evolution of links in the graph. In some causal tasks, however, link formation is path-dependent: The outcome of link interventions depends on existing links. Unfortunately, these existing causal methods are not designed for path-dependent link formation, as the cascading functional dependencies between links (arising from path dependence) are either unidentifiable or require an impractical number of control variables. To overcome this, we develop the first causal model capable of dealing with path dependencies in link prediction. In this work we introduce the concept of causal lifting, an invariance in causal models of independent interest that, on graphs, allows the identification of causal link prediction queries using limited interventional data. Further, we show how structural pairwise embeddings exhibit lower bias and correctly represent the task's causal structure, as opposed to existing node embeddings, e.g., graph neural network node embeddings and matrix factorization. Finally, we validate our theoretical findings on three scenarios for causal link prediction tasks: knowledge base completion, covariance matrix estimation and consumer-product recommendations.
翻译:现有的链路预测因果模型假设存在一组潜在的节点固有因子——即节点生成时定义的内在特征——这些因子支配着图中链路的因果演化过程。然而,在某些因果任务中,链路的形成具有路径依赖性:链路干预的结果取决于现有链路。遗憾的是,现有因果方法并不适用于路径依赖型链路形成,因为链路间由路径依赖产生的级联功能依赖关系要么无法识别,要么需要数量不切实际的控制变量。为克服这一局限,我们开发了首个能够处理链路预测中路径依赖性的因果模型。本文提出因果提升概念——这一因果模型中的不变性具有独立研究价值,在图上能够利用有限的干预数据识别因果链路预测查询。此外,我们证明结构成对嵌入相较于现有节点嵌入(如图神经网络节点嵌入和矩阵分解)具有更低的偏差,并能正确表征任务的因果结构。最后,我们在三个因果链路预测任务场景中验证了理论成果:知识库补全、协方差矩阵估计以及消费者-产品推荐。