Since the release of ChatGPT in November 2023, large language models (LLMs) have seen considerable success, including in the open-source community, with many open-weight models available. However, the requirements to deploy such a service are often unknown and difficult to evaluate in advance. To facilitate this process, we conducted numerous tests at the Centre Inria de l'Universit\'e de Bordeaux. In this article, we propose a comparison of the performance of several models of different sizes (mainly Mistral and LLaMa) depending on the available GPUs, using vLLM, a Python library designed to optimize the inference of these models. Our results provide valuable information for private and public groups wishing to deploy LLMs, allowing them to evaluate the performance of different models based on their available hardware. This study thus contributes to facilitating the adoption and use of these large language models in various application domains.
翻译:自2023年11月ChatGPT发布以来,大语言模型(LLMs)取得了显著成功,开源社区亦涌现出众多开放权重的模型。然而,部署此类服务所需的具体条件往往未知且难以预先评估。为简化这一过程,我们在波尔多大学Inria中心进行了大量测试。本文基于可用GPU配置,利用专为优化模型推理而设计的Python库vLLM,对不同规模(主要针对Mistral和LLaMa系列)的多个模型进行了性能比较。我们的研究结果为有意部署LLM的私营及公共机构提供了重要参考,使其能够依据现有硬件条件评估不同模型的性能表现。本研究由此为促进大语言模型在各应用领域的采纳与使用提供了支持。