Since the release of ChatGPT in November 2022, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available. However, the requirements to deploy such a service are often unknown and difficult to evaluate in advance. To facilitate this process, we conducted numerous tests at the Centre Inria de l'Universit\'e de Bordeaux. In this article, we propose a comparison of the performance of several models of different sizes (mainly Mistral and LLaMa) depending on the available GPUs, using vLLM, a Python library designed to optimize the inference of these models. Our results provide valuable information for private and public groups wishing to deploy LLMs, allowing them to evaluate the performance of different models based on their available hardware. This study thus contributes to facilitating the adoption and use of these large language models in various application domains.
翻译:自2022年11月ChatGPT发布以来,大语言模型(LLMs)取得了显著成功,开源社区亦涌现出众多开放权重的模型。然而,部署此类服务所需的具体条件往往难以预先评估。为简化这一过程,我们在波尔多大学Inria中心进行了大量测试。本文基于可用GPU配置,利用专为优化模型推理而设计的Python库vLLM,对不同规模模型(主要为Mistral和LLaMa系列)的性能进行了系统比较。研究结果为有意部署LLM的私营及公共机构提供了关键参考,使其能够根据现有硬件条件评估不同模型的性能表现。本研究由此为促进大语言模型在各应用领域的采纳与使用提供了技术支持。