RooFit is a toolkit for statistical modeling and fitting used by most experiments in particle physics. Just as data sets from next-generation experiments grow, processing requirements for physics analysis become more computationally demanding, necessitating performance optimizations for RooFit. One possibility to speed-up minimization and add stability is the use of Automatic Differentiation (AD). Unlike for numerical differentiation, the computation cost scales linearly with the number of parameters, making AD particularly appealing for statistical models with many parameters. In this paper, we report on one possible way to implement AD in RooFit. Our approach is to add a facility to generate C++ code for a full RooFit model automatically. Unlike the original RooFit model, this generated code is free of virtual function calls and other RooFit-specific overhead. In particular, this code is then used to produce the gradient automatically with Clad. Clad is a source transformation AD tool implemented as a plugin to the clang compiler, which automatically generates the derivative code for input C++ functions. We show results demonstrating the improvements observed when applying this code generation strategy to HistFactory and other commonly used RooFit models. HistFactory is the subcomponent of RooFit that implements binned likelihood models with probability densities based on histogram templates. These models frequently have a very large number of free parameters and are thus an interesting first target for AD support in RooFit.
翻译:RooFit是粒子物理实验中广泛使用的统计建模与拟合工具包。随着下一代实验数据集的不断增长,物理分析所需的计算处理需求愈发严苛,这对RooFit的性能优化提出了要求。加速最小化过程并提升稳定性的方案之一,是采用自动微分技术。与数值微分不同,自动微分的计算开销随参数数量线性增长,使其特别适用于参数众多的统计模型。本文报告了在RooFit中实现自动微分的一种可行方案。我们的方法是为完整RooFit模型自动生成C++代码。与原始RooFit模型不同,生成的代码避免了虚函数调用及其他RooFit特有的额外开销。尤其地,该代码可进一步通过Clad自动生成梯度。Clad是一种基于源代码转换的自动微分工具,作为clang编译器的插件实现,可自动为输入C++函数生成导数代码。我们展示了将该代码生成策略应用于HistFactory及其他常用RooFit模型后所观察到的改进效果。HistFactory是RooFit的子组件,实现了基于直方图模板的概率密度的带箱似然模型。此类模型通常含有大量自由参数,因此成为RooFit中自动微分支持的首个理想应用目标。