In Racket, the LLVM IR, Rust, and other modern languages, programmers and static analyses can hint, with special annotations, that certain parts of a program are unreachable. Same as other assumptions about undefined behavior; the compiler assumes these hints are correct and transforms the program aggressively. While compile-time transformations due to undefined behavior often perplex compiler writers and developers, we show that the essence of transformations due to unreachable code can be distilled in a surprisingly small set of simple formal rules. Specifically, following the well-established tradition of understanding linguistic phenomena through calculi, we introduce the first calculus for unreachable. Its term-rewriting rules that take advantage of unreachable fall into two groups. The first group allows the compiler to delete any code downstream of unreachable, and any effect-free code upstream of unreachable. The second group consists of rules that eliminate conditional expressions when one of their branches is unreachable. We show the correctness of the rules with a novel logical relation, and we examine how they correspond to transformations due to unreachable in Racket and LLVM.
翻译:在Racket、LLVM IR、Rust等现代编程语言中,程序员和静态分析工具可通过特殊注解提示程序的某些部分不可达。与关于未定义行为的其他假设类似,编译器假定这些提示正确并据此对程序进行激进变换。虽然由未定义行为引发的编译时变换常使编译器开发者与程序员感到困惑,但我们证明不可达代码引发的变换本质可通过一组数量极少且形式简单的规则来精炼表达。具体而言,沿袭通过演算理解语言现象的成熟传统,我们首次提出了不可达代码演算。该演算中利用不可达性的项重写规则分为两类:第一类允许编译器删除不可达代码下游的所有代码,以及不可达代码上游的所有无副作用代码;第二类规则用于消除至少一个分支不可达的条件表达式。我们通过一种新颖的逻辑关系证明了这些规则的正确性,并考察了它们如何对应Racket和LLVM中由不可达性引发的程序变换。