Automatic Differentiation of Binned Likelihoods With Roofit and Clad

RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of Automatic Differentiation (AD). Unlike for numerical differentiation, the computation cost scales linearly with the number of parameters, making AD particularly appealing for statistical models with many parameters. In this paper, we report on one possible way to implement AD in RooFit. Our approach is to add a facility to generate C++ code for a full RooFit model automatically. Unlike the original RooFit model, this generated code is free of virtual function calls and other RooFit-specific overhead. In particular, this code is then used to produce the gradient automatically with Clad. Clad is a source transformation AD tool implemented as a plugin to the clang compiler, which automatically generates the derivative code for input C++ functions. We show results demonstrating the improvements observed when applying this code generation strategy to HistFactory and other commonly used RooFit models. HistFactory is the subcomponent of RooFit that implements binned likelihood models with probability densities based on histogram templates. These models frequently have a very large number of free parameters and are thus an interesting first target for AD support in RooFit.

翻译：RooFit是粒子物理实验中广泛使用的统计建模与拟合工具包。随着下一代实验数据集的不断增长，物理分析所需的计算处理需求愈发严苛，这对RooFit的性能优化提出了要求。加速最小化过程并提升稳定性的方案之一，是采用自动微分技术。与数值微分不同，自动微分的计算开销随参数数量线性增长，使其特别适用于参数众多的统计模型。本文报告了在RooFit中实现自动微分的一种可行方案。我们的方法是为完整RooFit模型自动生成C++代码。与原始RooFit模型不同，生成的代码避免了虚函数调用及其他RooFit特有的额外开销。尤其地，该代码可进一步通过Clad自动生成梯度。Clad是一种基于源代码转换的自动微分工具，作为clang编译器的插件实现，可自动为输入C++函数生成导数代码。我们展示了将该代码生成策略应用于HistFactory及其他常用RooFit模型后所观察到的改进效果。HistFactory是RooFit的子组件，实现了基于直方图模板的概率密度的带箱似然模型。此类模型通常含有大量自由参数，因此成为RooFit中自动微分支持的首个理想应用目标。

相关内容

自动微分

关注 4

在数学和计算机代数中，自动微分有时称作演算式微分，是一种可以借由计算机程序计算一个函数导数的方法。两种传统做微分的方法为：（1）对一个函数的表示式做符号上的微分，并且计算其在某一点上的值。（2）使用差分。使用符号微分最主要的缺点是速度慢及将计算机程序转换成表示式的困难。此外，很多函数在要计算更高阶微分时会变得复杂。使用差分的两个重要的缺点是舍弃误差及数值化过程和相消误差。此两者传统方法在计算更高阶微分时，都有复杂度及误差增加的问题。自动微分则解决上述的问题。

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日