The entry of large language models (LLMs) into research and commercial spaces has led to a trend of ever-larger models, with initial promises of generalisability, followed by a widespread desire to downsize and create specialised models without the need for complete fine-tuning, using Parameter Efficient Fine-tuning (PEFT) methods. We present an investigation into the suitability of different PEFT methods to clinical decision-making tasks, across a range of model sizes, including extremely small models with as few as $25$ million parameters. Our analysis shows that the performance of most PEFT approaches varies significantly from one task to another, with the exception of LoRA, which maintains relatively high performance across all model sizes and tasks, typically approaching or matching full fine-tuned performance. The effectiveness of PEFT methods in the clinical domain is evident, particularly for specialised models which can operate on low-cost, in-house computing infrastructure. The advantages of these models, in terms of speed and reduced training costs, dramatically outweighs any performance gain from large foundation LLMs. Furthermore, we highlight how domain-specific pre-training interacts with PEFT methods and model size, and discuss how these factors interplay to provide the best efficiency-performance trade-off. Full code available at: tbd.
翻译:大型语言模型(LLMs)进入科研与商业领域后,模型规模呈现持续增长趋势,初期以泛化能力为承诺,随后广泛兴起通过参数高效微调(PEFT)方法实现模型精简与专业化,无需完整微调。本研究针对不同规模模型(包括参数量低至2500万的极小模型),系统探究了多种PEFT方法在临床决策任务中的适用性。分析表明:除LoRA方法外,多数PEFT方法的性能在不同任务间存在显著差异——LoRA在所有模型规模与任务中均保持较高性能,通常接近或达到全参数微调的效果。PEFT方法在临床领域的有效性已得到证实,尤其适用于可在低成本本地计算基础设施上运行的专业化模型。这些模型在训练速度与成本降低方面的优势,大幅超越大型基础LLMs带来的性能增益。此外,我们揭示了领域预训练与PEFT方法及模型规模之间的交互关系,并探讨如何通过平衡这些因素实现最优效率-性能权衡。完整代码见:待定。