L-systems are a mathematical formalism proposed by biologist Aristid Lindenmayer with the aim of simulating organic structures such as trees, snowflakes, flowers, and other branching phenomena. They are implemented as a formal language that defines how patterns can be iteratively rewritten. This paper describes how such a formalism can be used to create artificial programs written in programming languages such as C, C++, Julia and Go. These programs, being large and complex, can be used to test the performance of compilers, operating systems, and computer architectures. This paper demonstrates the usefulness of these benchmarks through multiple case studies. These case studies include a comparison between clang and gcc; a comparison between C, C++, Julia and Go; a study of the historical evolution of gcc in terms of code quality; a look into the effects of profile guided optimizations in gcc; an analysis of the asymptotic behavior of the different phases of clang's compilation pipeline; and a comparison between the many data structures available in the Gnome Library (GLib). These case studies demonstrate the benefits of the L-System approach to create benchmarks, when compared with fuzzers such as CSmith, which were designed to uncover bugs in compilers, rather than evaluating their performance.
翻译:L系统是由生物学家Aristid Lindenmayer提出的一种数学形式体系,旨在模拟树木、雪花、花朵等有机结构及其他分支现象。该系统通过形式语言实现,定义了模式如何迭代重写的规则。本文阐述了如何利用该形式体系生成以C、C++、Julia和Go等编程语言编写的人工程序。这些程序规模庞大且结构复杂,可用于测试编译器、操作系统及计算机体系结构的性能。本文通过多个案例研究论证了此类基准测试的有效性,包括:clang与gcc编译器的对比分析;C、C++、Julia及Go语言的性能比较;gcc编译器代码质量的历史演进研究;gcc中基于性能剖析的优化效果探究;clang编译流程各阶段渐近行为的解析;以及Gnome库(GLib)中多种数据结构的对比评估。相较于CSmith等专注于发现编译器缺陷的模糊测试工具,这些案例研究证明了L系统方法在创建用于性能评估的基准测试方面具有显著优势。