Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of dependencies that connects them. We present a graph-augmented approach that combines dense text embeddings of Lean formalizations with graph neural networks over a heterogeneous dependency graph capturing both state--premise and premise--premise relations. On the LeanDojo Benchmark, our method outperforms the ReProver language-based baseline by over 25% across standard retrieval metrics. These results demonstrate the power of relational information for more effective premise selection.
翻译:前提选择是扩展大型形式化库中定理证明的关键瓶颈。然而,现有的基于语言的方法通常孤立地处理前提,忽略了连接它们的依赖关系网络。我们提出了一种图增强方法,该方法将Lean形式化的稠密文本嵌入与基于异构依赖图的图神经网络相结合,该图同时捕获了状态-前提和前提-前提关系。在LeanDojo基准测试中,我们的方法在标准检索指标上比基于语言的基线ReProver提高了超过25%。这些结果证明了关系信息对于更有效前提选择的重要性。