Despite the state-of-the-art performance of Large Language Models (LLMs) achieved on many tasks, their massive scale often leads to high computational and environmental costs, limiting their accessibility. Parameter-Efficient Fine-Tuning (PEFT) methods address this challenge by reducing the number of trainable parameters while maintaining strong downstream performance. Despite the advances in PEFT methods, current evaluations remain limited (in terms of evaluated models and datasets) and difficult to reproduce. To bridge this gap, we introduce PEFT-Bench, a unified end-to-end benchmark for evaluating diverse PEFT methods on autoregressive LLMs. We demonstrate its usage across 27 NLP datasets and 7 PEFT methods. To account for different PEFT training and inference factors, we also introduce the PEFT Soft Cost Penalties (PSCP) metric, which takes trainable parameters, inference speed, and training memory usage into account.
翻译:尽管大型语言模型(LLM)在许多任务上取得了最先进的性能,但其庞大的规模通常会导致高昂的计算和环境成本,从而限制了其可及性。参数高效微调(PEFT)方法通过减少可训练参数的数量,同时保持强大的下游性能,来应对这一挑战。尽管PEFT方法取得了进展,但当前的评估仍然有限(就评估的模型和数据集而言)且难以复现。为了弥补这一差距,我们引入了PEFT-Bench,这是一个用于在自回归LLM上评估多种PEFT方法的统一端到端基准。我们在27个NLP数据集和7种PEFT方法上展示了其应用。为了考虑不同的PEFT训练和推理因素,我们还引入了PEFT软成本惩罚(PSCP)指标,该指标综合考虑了可训练参数、推理速度和训练内存使用情况。