An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the computation by dividing the model in several partitions and using multiple CPUs to compute the result using techniques such as MPI and multi-threading. Alternatively, GPUs are also a good candidate to accelerate the computation due to their massively parallel architecture that allows many floating point operations per second to be performed. The numerical iterative solver takes thus the most computational time and is challenging to solve efficiently due to the dependencies that exist in the model between cells. In this work, we evaluate the OPM Flow simulator and compare several state-of-the-art GPU solver libraries as well as custom developed solutions for a BiCGStab solver using an ILU0 preconditioner and benchmark their performance against the default DUNE library implementation running on multiple CPU processors using MPI. The evaluated GPU software libraries include a manual linear solver in OpenCL and the integration of several third party sparse linear algebra libraries, such as cuSparse, rocSparse, and amgcl. To perform our bench-marking, we use small, medium, and large use cases, starting with the public test case NORNE that includes approximately 50k active cells and ending with a large model that includes approximately 1 million active cells. We find that a GPU can accelerate a single dual-threaded MPI process up to 5.6 times, and that it can compare with around 8 dual-threaded MPI processes.

翻译：众所周知，在提高模拟精度或扩大模型网格尺寸时，实际油藏模拟的计算时间成本极高。解决该问题的一种方法是，通过将模型划分为多个分区，并利用MPI和多线程等技术，在多个CPU上并行计算。另一种方案是采用GPU，因其拥有大规模并行架构，能执行大量浮点运算，从而有效加速计算过程。数值迭代求解器占据绝大部分计算时间，且由于模型内单元格间存在依赖关系，其高效求解极具挑战性。本研究评估了OPM Flow模拟器，比较了数种前沿GPU求解器库及为使用ILU0预条件子的BiCGStab求解器自定义开发的解决方案，并以默认的DUNE库（在多个CPU处理器上通过MPI运行）为基准测试其性能。评估的GPU软件库包括基于OpenCL的手动线性求解器，以及集成的多种第三方稀疏线性代数库，如cuSparse、rocSparse和amgcl。为进行基准测试，我们采用了小、中、大三种规模用例：从包含约5万个活跃单元格的公开测试用例NORNE，到包含约100万个活跃单元格的大型模型。研究发现，单个GPU可将双线程MPI进程的运算速度提升至多5.6倍，其性能与约8个双线程MPI进程相当。