Large Language Models (LLMs) have demonstrated impressive capabilities to solve a wide range of tasks without being explicitly fine-tuned on task-specific datasets. However, deploying LLMs in the real world is not trivial, as it requires substantial computing resources. In this paper, we investigate whether smaller, compact LLMs are a good alternative to the comparatively Larger LLMs2 to address significant costs associated with utilizing LLMs in the real world. In this regard, we study the meeting summarization task in a real-world industrial environment and conduct extensive experiments by comparing the performance of fine-tuned compact LLMs (e.g., FLAN-T5, TinyLLaMA, LiteLLaMA) with zero-shot larger LLMs (e.g., LLaMA-2, GPT-3.5, PaLM-2). We observe that most smaller LLMs, even after fine-tuning, fail to outperform larger zero-shot LLMs in meeting summarization datasets. However, a notable exception is FLAN-T5 (780M parameters), which performs on par or even better than many zero-shot Larger LLMs (from 7B to above 70B parameters), while being significantly smaller. This makes compact LLMs like FLAN-T5 a suitable cost-efficient solution for real-world industrial deployment.
翻译:大型语言模型(LLMs)在无需针对特定任务数据集进行显式微调的情况下,已展现出解决广泛任务的惊人能力。然而,在真实世界中部署LLMs并非易事,因其需要大量计算资源。本文探究了小型紧凑型LLMs是否可作为相对较大的LLMs的有效替代方案,以解决在真实世界中使用LLMs所带来的高昂成本问题。为此,我们在真实工业环境中针对会议摘要任务展开研究,通过对比微调后的紧凑型LLMs(如FLAN-T5、TinyLLaMA、LiteLLaMA)与零样本大型LLMs(如LLaMA-2、GPT-3.5、PaLM-2)的性能,进行了广泛实验。我们观察到,大多数小型LLMs即使在微调后,在会议摘要数据集上仍未能超越零样本大型LLMs。然而,一个显著的例外是FLAN-T5(7.8亿参数),其性能与许多零样本大型LLMs(参数规模从70亿至超过700亿)相当甚至更优,同时模型体积显著更小。这使得FLAN-T5等紧凑型LLMs成为真实世界工业部署中兼具成本效益的解决方案。