We introduce Dynamic Dropout, a novel regularization technique designed to enhance the training efficiency of Transformer models by dynamically adjusting the dropout rate based on training epochs or validation loss improvements. This approach addresses the challenge of balancing regularization and model capacity, which is crucial for achieving fast convergence and high performance. Our method involves modifying the GPT model to accept a variable dropout rate and updating dropout layers during training using schedules such as linear decay, exponential decay, and validation loss-based adjustments. Extensive experiments on the Shakespeare\_char dataset demonstrate that Dynamic Dropout significantly accelerates training and improves inference efficiency compared to a baseline model with a fixed dropout rate. The validation loss-based adjustment schedule provided the best overall performance, highlighting the potential of Dynamic Dropout as a valuable technique for training large-scale Transformer models.
翻译:本文提出一种新颖的正则化技术——动态Dropout,该技术通过根据训练轮次或验证损失改进动态调整dropout率,旨在提升Transformer模型的训练效率。此方法解决了平衡正则化与模型容量的挑战,这对实现快速收敛与高性能至关重要。我们的方法涉及修改GPT模型以接受可变dropout率,并在训练过程中使用线性衰减、指数衰减及基于验证损失的调整等调度策略更新dropout层。在Shakespeare\_char数据集上的大量实验表明,与采用固定dropout率的基线模型相比,动态Dropout能显著加速训练并提升推理效率。其中基于验证损失的调整调度策略展现出最佳综合性能,彰显了动态Dropout作为训练大规模Transformer模型重要技术的潜力。