This study presents a benchmarking analysis of the Qualcomm Cloud AI 100 Ultra (QAic) accelerator for large language model (LLM) inference, evaluating its energy efficiency (throughput per watt) and performance against leading NVIDIA (A100, H200) and AMD (MI300A) GPUs within the National Research Platform (NRP) ecosystem. A total of 15 open-source LLMs, ranging from 117 million to 90 billion parameters, are served using the vLLM framework. The QAic inference cards appears to be energy efficient and performs well in the energy efficiency metric in most cases. The findings offer insights into the potential of the Qualcomm Cloud AI 100 Ultra for high-performance computing (HPC) applications within the National Research Platform (NRP).
翻译:本研究对高通Cloud AI 100 Ultra(QAic)加速器在大语言模型推理任务上进行了基准测试分析,评估了其能效(每瓦吞吐量)及性能,并与国家研究平台生态系统中的主流NVIDIA(A100、H200)和AMD(MI300A)GPU进行了对比。研究使用vLLM框架部署了总计15个开源大语言模型,参数量范围从1.17亿至900亿。结果表明,QAic推理卡在多数情况下展现出优异的能效表现,并在能效指标上表现良好。这些发现为高通Cloud AI 100 Ultra在国家研究平台内的高性能计算应用潜力提供了重要参考。