The Hessian-vector product computation appears in many scientific applications such as in optimization and finite element modeling. Often there is a need for computing Hessian-vector products at many data points concurrently. We propose an automatic differentiation (AD) based method, CHESSFAD (Chunked HESSian using Forward-mode AD), that is designed with efficient parallel computation of Hessian and Hessian-Vector products in mind. CHESSFAD computes second-order derivatives using forward mode and exposes parallelism at different levels that can be exploited on accelerators such as NVIDIA GPUs. In CHESSFAD approach, the computation of a row of the Hessian matrix is independent of the computation of other rows. Hence rows of the Hessian matrix can be computed concurrently. The second level of parallelism is exposed because CHESSFAD approach partitions the computation of a Hessian row into chunks, where different chunks can be computed concurrently. CHESSFAD is implemented as a lightweight header-based C++ library that works both for CPUs and GPUs. We evaluate the performance of CHESSFAD for performing a large number of independent Hessian-Vector products on a set of standard test functions and compare its performance to other existing header-based C++ libraries such as {\tt autodiff}. Our results show that CHESSFAD performs better than {\tt autodiff}, on all these functions with improvement ranging from 5-50\% on average.
翻译:海森-向量积计算广泛出现在优化与有限元建模等科学应用中,通常需要同时对大量数据点进行海森-向量积计算。本文提出一种基于自动微分(AD)的方法——CHESSFAD(基于前向模式AD的分块海森计算),该方法专为海森矩阵及海森-向量积的高效并行计算而设计。CHESSFAD采用前向模式计算二阶导数,并在不同层级上显式暴露并行性,可在NVIDIA GPU等加速器上充分利用。在CHESSFAD方法中,海森矩阵单行的计算独立于其他行,因此各行可并行计算。第二层并行性源于CHESSFAD将单行海森矩阵的计算划分为多个数据块,不同数据块可并发执行。CHESSFAD以轻量级头文件式C++库形式实现,同时支持CPU与GPU平台。我们通过标准测试函数集评估CHESSFAD执行大规模独立海森-向量积计算的性能,并与{\tt autodiff}等其他现有头文件式C++库进行对比。实验结果表明,在所有测试函数上CHESSFAD均优于{\tt autodiff},平均性能提升幅度达5-50%。