We present and evaluate the Futhark implementation of reverse-mode automatic differentiation (AD) for the basic blocks of parallel programming: reduce, prefix sum (scan), and reduce by index. We first present derivations of general-case algorithms and then discuss several specializations that result in efficient differentiation of most cases of practical interest. We report an experiment that evaluates the performance of the differentiated code in the context of GPU execution and highlights the impact of the proposed specializations as well as the strengths and weaknesses of differentiating at high level vs. low level (i.e., ``differentiating the memory'').
翻译:我们提出并评估了针对并行编程基本构件——归约(reduce)、前缀和(scan)和按索引归约(reduce by index)的逆向模式自动微分(AD)的Futhark实现。首先给出通用算法的推导过程,随后讨论多种特化方法,这些方法能够高效地对大多数实际应用场景进行微分。我们报告了一项实验,在GPU执行环境下评估了微分代码的性能,重点展示了所提特化方法的影响,以及高层级微分与低层级微分(即“对内存进行微分”)各自的优势与局限。