Large Language Models (LLMs), such as GPT-4 and DeepSeek, have been applied to a wide range of domains in software engineering. However, their potential in the context of High-Performance Computing (HPC) much remains to be explored. This paper evaluates how well DeepSeek, a recent LLM, performs in generating a set of HPC benchmark codes: a conjugate gradient solver, the parallel heat equation, parallel matrix multiplication, DGEMM, and the STREAM triad operation. We analyze DeepSeek's code generation capabilities for traditional HPC languages like Cpp, Fortran, Julia and Python. The evaluation includes testing for code correctness, performance, and scaling across different configurations and matrix sizes. We also provide a detailed comparison between DeepSeek and another widely used tool: GPT-4. Our results demonstrate that while DeepSeek generates functional code for HPC tasks, it lags behind GPT-4, in terms of scalability and execution efficiency of the generated code.
翻译:以GPT-4和DeepSeek为代表的大语言模型(LLMs)已在软件工程的众多领域得到应用。然而,它们在高性能计算(HPC)领域的潜力仍有待深入探索。本文评估了近期推出的大语言模型DeepSeek在一组HPC基准代码生成任务中的表现,包括共轭梯度求解器、并行热传导方程、并行矩阵乘法、DGEMM以及STREAM triad操作。我们分析了DeepSeek针对C++、Fortran、Julia和Python等传统HPC语言的代码生成能力。评估内容涵盖代码正确性、性能表现以及在不同配置和矩阵规模下的可扩展性测试。此外,我们还对DeepSeek与另一款广泛使用的工具GPT-4进行了详细比较。结果表明,虽然DeepSeek能够为HPC任务生成可运行的代码,但在生成代码的可扩展性和执行效率方面仍落后于GPT-4。