The notion of $\alpha$-equivalence between $\lambda$-terms is commonly used to identify terms that are considered equal. However, due to the primitive treatment of free variables, this notion falls short when comparing subterms occurring within a larger context. Depending on the usage of the Barendregt convention (choosing different variable names for all involved binders), it will equate either too few or too many subterms. We introduce a formal notion of context-sensitive $\alpha$-equivalence, where two open terms can be compared within a context that resolves their free variables. We show that this equivalence coincides exactly with the notion of bisimulation equivalence. Furthermore, we present an efficient $O(n\log n)$ runtime algorithm that identifies $\lambda$-terms modulo context-sensitive $\alpha$-equivalence, improving upon a previously established $O(n\log^2 n)$ bound for a hashing modulo ordinary $\alpha$-equivalence by Maziarz et al. Hashing $\lambda$-terms is useful in many applications that require common subterm elimination and structure sharing. We employ the algorithm to obtain a large-scale, densely packed, interconnected graph of mathematical knowledge from the Coq proof assistant for machine learning purposes.
翻译:$\lambda$-项之间的 $\alpha$-等价性概念通常用于识别被认为相等的项。然而,由于对自由变量的原始处理,当比较出现在较大上下文中的子项时,这一概念显得不足。根据Barendregt约定的使用方式(为所有涉及的绑定器选择不同的变量名),它要么等同太少的子项,要么等同太多的子项。我们引入了上下文敏感 $\alpha$-等价性的形式化概念,其中两个开放项可以在解析其自由变量的上下文中进行比较。我们证明这种等价性恰好与互模拟等价性的概念一致。此外,我们提出了一种高效的 $O(n\log n)$ 运行时算法,用于在上下文敏感的 $\alpha$-等价性模下识别 $\lambda$-项,改进了Maziarz等人先前针对普通 $\alpha$-等价性模散列所建立的 $O(n\log^2 n)$ 界限。散列 $\lambda$-项在许多需要公共子项消除和结构共享的应用中非常有用。我们采用该算法从Coq证明助手中获得一个大规模、密集连接、相互关联的数学知识图,用于机器学习目的。