Lookup tables (finite maps) are a ubiquitous data structure. In pure functional languages they are best represented using trees instead of hash tables. In pure functional languages within constructive logic, without a primitive integer type, they are well represented using binary tries instead of search trees. In this work, we introduce canonical binary tries, an improved binary-trie data structure that enjoys a natural extensionality property, quite useful in proofs, and supports sparseness more efficiently. We provide full proofs of correctness in Coq. We provide microbenchmark measurements of canonical binary tries versus several other data structures for finite maps, in a variety of application contexts; as well as measurement of canonical versus original tries in two big, real systems. The application context of data structures contained in theorem statements imposes unusual requirements for which canonical tries are particularly well suited.
翻译:查找表(有限映射)是一种无处不在的数据结构。在纯函数式语言中,用树结构而非哈希表来表示它们是最佳方案。而在构造性逻辑的纯函数式语言中,由于缺乏原始整数类型,用二叉字典树而非搜索树来表示它们更为合适。本文提出了规范二叉字典树,这是一种改进的二叉字典树数据结构,它具备自然的扩展性属性(这一属性在证明中非常有用),并能更高效地支持稀疏性。我们提供了在Coq中的完整正确性证明。我们展示了规范二叉字典树与若干其他有限映射数据结构在多种应用场景下的微基准测试结果,以及规范字典树与原始字典树在两个大型实际系统中的性能对比。定理陈述中包含的数据结构应用场景对性能提出了特殊要求,而规范字典树恰好能很好地满足这些要求。