Most link prediction methods return estimates of the connection probability of missing edges in a graph. Such output can be used to rank the missing edges, from most to least likely to be a true edge, but it does not directly provide a classification into true and non-existent. In this work, we consider the problem of identifying a set of true edges with a control of the false discovery rate (FDR). We propose a novel method based on high-level ideas from the literature on conformal inference. The graph structure induces intricate dependence in the data, which we carefully take into account, as this makes the setup different from the usual setup in conformal inference, where exchangeability is assumed. The FDR control is empirically demonstrated for both simulated and real data.
翻译:大多数链接预测方法返回图中缺失边连接概率的估计值。此类输出可用于对缺失边进行排序(从最可能到最不可能为真实边),但无法直接将其分类为存在边或不存在边。本研究考虑在控制错误发现率(FDR)的前提下识别真实边集的问题。我们基于保形推断文献中的高级思想提出了一种新方法。图结构在数据中引入了复杂的依赖关系,我们对此进行了仔细考量,因为这使得本设置不同于通常假设可交换性的保形推断设置。通过模拟数据和真实数据,我们实证验证了该方法对FDR的控制能力。