Most link prediction methods return estimates of the connection probability of missing edges in a graph. Such output can be used to rank the missing edges from most to least likely to be a true edge, but does not directly provide a classification into true and non-existent. In this work, we consider the problem of identifying a set of true edges with a control of the false discovery rate (FDR). We propose a novel method based on high-level ideas from the literature on conformal inference. The graph structure induces intricate dependence in the data, which we carefully take into account, as this makes the setup different from the usual setup in conformal inference, where data exchangeability is assumed. The FDR control is empirically demonstrated for both simulated and real data.
翻译:大多数链接预测方法返回图中缺失边连接概率的估计值。此类输出可用于将缺失边按真实可能性从高到低排序,但无法直接区分真实边与虚构边。本研究聚焦于在控制错误发现率(FDR)的前提下识别真实边集合的问题。我们提出了一种基于共形推断文献中高层次思想的新方法。图结构会导致数据间产生复杂依赖性,我们通过谨慎考量这种依赖性——因为该特性使得本场景有别于假设数据可交换性的常规共形推断设定——从而在模拟数据与真实数据上均实证验证了FDR控制效果。