Although Large Language Models (LLMs) represent a revolution in the way we interact with computers, allowing the construction of complex questions and the ability to reason over a sequence of statements, their use is restricted due to the need for dedicated hardware for execution. In this study, we evaluate the performance of LLMs based on the 7 and 13 billion LLaMA models, subjected to a quantization process and run on home hardware. The models considered were Alpaca, Koala, and Vicuna. To evaluate the effectiveness of these models, we developed a database containing 1,006 questions from the ENEM (Brazilian National Secondary School Exam). Our analysis revealed that the best performing models achieved an accuracy of approximately 46% for the original texts of the Portuguese questions and 49% on their English translations. In addition, we evaluated the computational efficiency of the models by measuring the time required for execution. On average, the 7 and 13 billion LLMs took approximately 20 and 50 seconds, respectively, to process the queries on a machine equipped with an AMD Ryzen 5 3600x processor
翻译:尽管大型语言模型(LLMs)代表了我们与计算机交互方式的一场革命,能够构建复杂问题并在一系列陈述上进行推理,但由于需要专用硬件才能执行,其使用受到限制。在本研究中,我们评估了基于70亿和130亿参数的LLaMA模型、经过量化处理并在家用硬件上运行的LLMs的性能。所考虑的模型包括Alpaca、Koala和Vicuna。为了评估这些模型的有效性,我们构建了一个包含1006道巴西国家中学考试(ENEM)题目的数据库。我们的分析显示,性能最佳的模型在葡萄牙语原题上准确率约为46%,而在其英文翻译上准确率约为49%。此外,我们通过测量执行时间来评估模型的计算效率。平均而言,70亿和130亿参数的LLMs在配备AMD Ryzen 5 3600x处理器的机器上处理查询分别需要约20秒和50秒。