Parsing underpins a vast range of software engineering tasks, from compilers and static analyzers to language servers and fuzz testing tools. Yet most parsers deployed in practice are deterministic (LL or LR), forcing developers not only to contort their grammars to fit the parser, but to simplify the very languages they design sacrificing expressiveness for the sake of parseability. General context-free parsers eliminate this constraint. Yet, despite decades of algorithmic development, no rigorous head-to-head comparison exists across the major families of parsing algorithms. We present the first unified, controlled benchmark of six generalized parsing algorithms: CYK, Valiant, Earley, GLL, RNGLR, and BRNGLR, plus deterministic LL(1) and LR(1) baselines, all implemented in Rust with shared data structures and parse-tree extraction, and evaluated across 22 grammars ranging from simple expressions to full C++ and Java. Our results show that the cost of generality is lower than widely assumed. On deterministic grammars, the GLR family incurs only a 3x median slowdown over LR(1), with a narrow and predictable variance. GLR is the clear performance winner among generalized parsers and a practical default choice for software engineering tools.
翻译:解析是众多软件工程任务的基石,从编译器和静态分析器到语言服务器和模糊测试工具。然而实践中部署的大多数解析器都是确定性的(LL或LR),这迫使开发者不仅需要调整文法以适应解析器,还需简化所设计的语言本身——为可解析性牺牲表达能力。通用上下文无关解析器消除了这一约束。尽管算法研究已历经数十年,但各主要解析算法家族之间仍缺乏严谨的直接比较。我们首次提出了六种通用解析算法的统一受控基准测试:CYK、Valiant、Earley、GLL、RNGLR和BRNGLR,辅以确定性LL(1)和LR(1)基线,所有算法均以Rust实现并共享数据结构和解析树提取,在22种文法(涵盖简单表达式至完整C++和Java语法)上进行了评估。结果表明,通用性的代价远低于普遍预期。在确定性文法上,GLR家族相对LR(1)的中位速度衰减仅为3倍,且方差窄而可预测。GLR是通用解析器中明确的性能优胜者,也是软件工程工具中实用的默认选择。