We present a modern C++17-compatible thread pool implementation, built from scratch with high-performance scientific computing in mind. The thread pool is implemented as a single lightweight and self-contained class, and does not have any dependencies other than the C++17 standard library, thus allowing a great degree of portability. In particular, our implementation does not utilize OpenMP or any other high-level multithreading APIs, and thus gives the programmer precise low-level control over the details of the parallelization, which permits more robust optimizations. The thread pool was extensively tested on both AMD and Intel CPUs with up to 40 cores and 80 threads. This paper provides motivation, detailed usage instructions, and performance tests.
翻译:我们提出了一个现代化的C++17兼容线程池实现,专为高性能科学计算从头构建。该线程池以单一轻量级自包含类实现,除C++17标准库外无任何依赖,因此具有极高的可移植性。特别地,我们的实现未使用OpenMP或任何其他高级多线程API,从而赋予程序员对并行化细节的精确底层控制能力,这有助于实现更稳健的优化。该线程池已在配备最多40核80线程的AMD与Intel CPU上进行了广泛测试。本文阐述了设计动机、详细的使用说明及性能测试结果。