We present ZipLex, a verified framework for invertible lexical analysis. Unlike past verified lexers that focus only on satisfying the semantics of regular expressions and the maximal munch property, ZipLex also guarantees that lexing and printing are mutual inverses. Our design relies on two sets of ideas: (1) a new abstraction of token sequences that captures the separability of tokens in a sequence while supporting their efficient manipulation, and (2) a combination of verified data structures and optimizations, including Huet's zippers and memoized derivatives, to achieve practical performance. We implemented ZipLex in Scala and verified its correctness, including invertibility, using the Stainless verifier. Our evaluation demonstrates that ZipLex supports realistic applications such as JSON processing and lexers of programming languages. In comparison to other verified lexers (which do not enforce invertibility), ZipLex is 4x slower than Coqlex and two orders of magnitude faster than Verbatim++, showing that verified invertibility can be achieved without prohibitive cost.
翻译:我们提出了ZipLex,一个可验证的可逆词法分析框架。与以往仅关注满足正则表达式语义和最大匹配属性的可验证词法分析器不同,ZipLex还保证了词法分析与打印互为逆操作。我们的设计依赖于两组核心理念:(1) 一种新的词元序列抽象,它捕获了序列中词元的可分离性,同时支持对其高效操作;(2) 结合了可验证数据结构与优化技术,包括Huet拉链和记忆化导数,以实现实用性能。我们在Scala中实现了ZipLex,并使用Stainless验证器验证了其正确性,包括可逆性。我们的评估表明,ZipLex支持JSON处理及编程语言词法分析器等实际应用。与其他可验证词法分析器(不强制要求可逆性)相比,ZipLex比Coqlex慢4倍,但比Verbatim++快两个数量级,这表明实现可验证的可逆性并不会带来难以承受的性能代价。