Deforestation is a compiler optimization that removes intermediate data structure allocations from functional programs to improve their efficiency. This is an old idea, but previous approaches have proved limited or impractical: they either only worked on compositions of predefined combinators (shortcut fusion), or involved the aggressive unfolding of recursive definitions until a depth limit was reached or a reoccurring pattern was found to tie the recursive knot, resulting in impractical algorithmic complexity and large amounts of code duplication. We present Lumberhack, a general-purpose deforestation approach for purely functional call-by-value programs. Lumberhack uses subtype inference to reason about data structure production and consumption and uses an elaboration pass to fuse the corresponding recursive definitions. It fuses large classes of mutually recursive definitions while avoiding much of the unproductive (and sometimes counter-productive) code duplication inherent in previous approaches. We prove the soundness of Lumberhack using logical relations and experimentally demonstrate significant speedups in the standard nofib benchmark suite.
翻译:消除中间数据结构(Deforestation)是一种编译器优化技术,旨在通过移除函数式程序中的中间数据结构分配来提高其效率。这是一个古老的想法,但以往的方法被证明具有局限性或不切实际:它们要么仅适用于预定义组合子的组合(捷径融合),要么涉及对递归定义进行激进的展开,直到达到深度限制或找到重复模式以绑定递归结,这导致了不切实际的算法复杂度和大量的代码重复。我们提出了Lumberhack,一种用于纯函数式按值调用程序的通用消除中间数据结构方法。Lumberhack使用子类型推断来推理数据结构的产生与消费,并利用精化处理过程来融合相应的递归定义。它能够融合大类的相互递归定义,同时避免了先前方法中固有的许多非生产性(有时甚至是反生产性)代码重复。我们使用逻辑关系证明了Lumberhack的可靠性,并通过实验在标准的nofib基准测试套件中展示了显著的加速效果。