Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.
翻译:对偶数前向模式自动微分(AD)将每个标量值与其切线值配对,而对偶数反向模式AD则试图通过类似简单思路实现反向AD:将每个标量值与反向传播函数配对。Brunel、Mazza和Pagani已分析了该思路在高阶输入语言中的正确性与效率,但其分析采用了自定义操作语义,尚不清楚能否高效实现。受其线性因子分解思想的启发,我们将对偶数反向模式AD优化为一种具有正确复杂度、且能在支持可变数组的标准函数式语言(如Haskell)中高效实现的算法。除线性因子分解这一要素外,我们的优化步骤均源自函数式编程社区广为人知的思想。通过提供可微分大部分Haskell98语法特性的实用实现,我们展示了该技术的应用价值。相较于以往对偶数反向AD工作需通过序列化构建反向传递,我们证明了该技术可应用于任务并行源程序,并生成任务并行的导数计算过程。