Current state-of-the-art causal models for link prediction assume an underlying set of inherent node factors -- an innate characteristic defined at the node's birth -- that governs the causal evolution of links in the graph. In some causal tasks, however, link formation is path-dependent, i.e., the outcome of link interventions depends on existing links. For instance, in the customer-product graph of an online retailer, the effect of an 85-inch TV ad (treatment) likely depends on whether the costumer already has an 85-inch TV. Unfortunately, existing causal methods are impractical in these scenarios. The cascading functional dependencies between links (due to path dependence) are either unidentifiable or require an impractical number of control variables. In order to remedy this shortcoming, this work develops the first causal model capable of dealing with path dependencies in link prediction. It introduces the concept of causal lifting, an invariance in causal models that, when satisfied, allows the identification of causal link prediction queries using limited interventional data. On the estimation side, we show how structural pairwise embeddings -- a type of symmetry-based joint representation of node pairs in a graph -- exhibit lower bias and correctly represent the causal structure of the task, as opposed to existing node embedding methods, e.g., GNNs and matrix factorization. Finally, we validate our theoretical findings on four datasets under three different scenarios for causal link prediction tasks: knowledge base completion, covariance matrix estimation and consumer-product recommendations.
翻译:当前最先进的用于链接预测的因果模型假设存在一组潜在的节点固有因子(节点诞生时定义的内在特征),这些因子支配着图中链接的因果演化。然而,在某些因果任务中,链接形成具有路径依赖性,即链接干预的结果取决于现有链接。例如,在在线零售商的客户-产品图中,85英寸电视广告(处理)的效果很可能取决于客户是否已拥有85英寸电视。不幸的是,现有因果方法在这些场景中不实用。由于路径依赖,链接之间的级联功能依赖要么不可识别,要么需要数量不切实际的控制变量。为弥补这一不足,本文开发了首个能够处理链接预测中路径依赖的因果模型。它引入了因果提升概念,这是因果模型中的一种不变性,当满足此不变性时,可利用有限的干预数据识别因果链接预测查询。在估计方面,我们展示了结构成对嵌入(一种基于对称性的图节点对联合表示)相比现有节点嵌入方法(如GNN和矩阵分解),具有更低的偏差并能正确表示任务的因果结构。最后,我们在四种数据集上、三种不同场景下验证了我们的理论发现:知识库补全、协方差矩阵估计和消费者-产品推荐。