Crafting Efficient Fine-Tuning Strategies for Large Language Models

This paper addresses the challenges of efficiently fine-tuning large language models (LLMs) by exploring data efficiency and hyperparameter optimization. We investigate the minimum data required for effective fine-tuning and propose a novel hyperparameter optimization method that leverages early-stage model performance. Our experiments demonstrate that fine-tuning with as few as 200 samples can improve model accuracy from 70\% to 88\% in a product attribute extraction task. We identify a saturation point of approximately 6,500 samples, beyond which additional data yields diminishing returns. Our proposed bayesian hyperparameter optimization method, which evaluates models at 20\% of total training time, correlates strongly with final model performance, with 4 out of 5 top early-stage models remaining in the top 5 at completion. This approach led to a 2\% improvement in accuracy over baseline models when evaluated on an independent test set. These findings offer actionable insights for practitioners, potentially reducing computational load and dependency on extensive datasets while enhancing overall performance of fine-tuned LLMs.

翻译：本文通过探索数据效率与超参数优化，应对大型语言模型（LLMs）高效微调的挑战。我们研究了有效微调所需的最小数据量，并提出一种利用早期模型性能的新型超参数优化方法。实验表明，在产品属性抽取任务中，仅使用200个样本进行微调即可将模型准确率从70%提升至88%。我们发现了约6,500个样本的饱和点，超出该点后额外数据带来的收益递减。我们提出的贝叶斯超参数优化方法在总训练时长20%的阶段评估模型，其评估结果与最终模型性能高度相关——早期表现最佳的5个模型中有4个在训练完成后仍保持前5名。该方法在独立测试集上评估时，相比基线模型实现了2%的准确率提升。这些发现为实践者提供了可操作的见解，有望在提升微调后LLMs整体性能的同时，降低计算负荷并减少对大规模数据集的依赖。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日