This study explores the use of INT8-based emulation for accelerating traditional FP64-based HPC workloads on modern GPU architectures. Through SCILIB-Accel automatic BLAS offload tool for cache-coherent Unified Memory Architecture, we emulate FP64 matrix multiplications in the LSMS CPU application in the MuST suite without code changes. We find that accuracy depends on both arithmetic precision and the properties of the operator, which can be dealt with through tunable precision emulation. Unlike traditional mixed-precision approaches, this method preserves original algorithms while optimizing hardware utilization. We showcase the potential of improving accuracy and performance at the same time. This work highlights the potential of AI-driven hardware to transform HPC, advocating for adaptive precision strategies in future scientific computing.
翻译:本研究探讨了在现代GPU架构上利用基于INT8的模拟来加速传统基于FP64的高性能计算工作负载的方法。通过SCILIB-Accel自动BLAS卸载工具(适用于缓存一致性统一内存架构),我们在无需修改代码的情况下,模拟了MuST套件中LSMS CPU应用的FP64矩阵乘法。研究发现,精度取决于算术精度和算子的属性,这一问题可通过可调精度的模拟手段加以解决。与传统混合精度方法不同,本方法在保持原始算法的同时优化了硬件利用率,展示了同时提升精度与性能的潜力。这项工作凸显了AI驱动硬件变革高性能计算的潜力,倡导在未来的科学计算中采用自适应精度策略。