With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated with excessive parameter redundancy. Hence, a possible contradiction arises from negligible trainable parameters of LoRA and the effectiveness of previous dropout methods, which has been largely overlooked. To fill this gap, we first confirm that parameter-efficient LoRA is also overfitting-prone. We then revisit transformer-specific dropout methods, and establish their equivalence and distinctions mathematically and empirically. Building upon this comparative analysis, we introduce a unified framework for a comprehensive investigation, which instantiates these methods based on dropping position, structural pattern and compensation measure. Through this framework, we reveal the new preferences and performance comparisons of them when involved with limited trainable parameters. This framework also allows us to amalgamate the most favorable aspects into a novel dropout method named HiddenKey. Extensive experiments verify the remarkable superiority and sufficiency of HiddenKey across multiple models and tasks, which highlights it as the preferred approach for high-performance and parameter-efficient finetuning of LLMs.
翻译:随着大语言模型(LLM)展现出卓越能力,其已成为众多自然语言处理应用中的核心组件,而参数高效微调技术——尤其是LoRA——作为一种轻量级的模型定制方法已广受欢迎。与此同时,最初为全参数更新的全微调设计的各类Dropout方法,能够缓解因参数冗余过度导致的过拟合问题。因此,LoRA可训练参数极少与既有Dropout方法的有效性之间可能存在矛盾,这一矛盾在很大程度上被忽视了。为填补这一空白,我们首先证实了参数高效的LoRA同样容易过拟合。随后,我们重新审视了针对Transformer的特定Dropout方法,并从数学与实验角度确立了它们的等价性与差异性。基于此对比分析,我们引入了一个统一框架进行系统性研究,该框架依据丢弃位置、结构模式与补偿机制对这些方法进行实例化。通过该框架,我们揭示了这些方法在涉及有限可训练参数时的新偏好与性能对比。此框架还使我们能够整合最有利的要素,提出一种名为HiddenKey的新型Dropout方法。大量实验验证了HiddenKey在多种模型与任务上的显著优越性与充分有效性,表明其是实现高性能、参数高效的大语言模型微调的优选方法。