By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of representations as the optimization target for out-of-domain generalization in a fine-tuning setup. To encourage the retention of transferable representations, we consider trust region-based fine-tuning methods, which exploit task-specific skills without forgetting task-agnostic representations from pre-training. We unify parameter- and representation-space smoothing approaches by using trust region bounds to inform SAM-style regularizers on both of these optimization surfaces. We propose Trust Region Aware Minimization (TRAM), a fine-tuning algorithm that optimizes for flat minima and smooth, informative representations without forgetting pre-trained structure. We find that TRAM outperforms both sharpness-aware and trust region-based optimization methods on cross-domain language modeling and cross-lingual transfer, where robustness to domain transfer and representation generality are critical for success. TRAM establishes a new standard in training generalizable models with minimal additional computation.
翻译:在参数空间中降低损失表面的曲率,锐度感知最小化(SAM)在领域迁移下实现了广泛的鲁棒性提升。然而,本文并未聚焦于参数,而是将表示的可迁移性作为微调设置下跨域泛化的优化目标。为促进可迁移表示的保留,我们考虑了基于信任区域的微调方法,该方法在利用任务特定技能的同时,不会遗忘预训练中的任务无关表示。我们通过使用信任区域边界来指导SAM风格的正则化器在这两个优化表面上,统一了参数空间和表示空间的平滑方法。我们提出信任区域感知最小化(TRAM),一种微调算法,它优化平坦最小值和平滑且信息丰富的表示,同时不会遗忘预训练结构。我们发现TRAM在跨域语言建模和跨语言迁移上优于锐度感知和基于信任区域的优化方法,在这些场景中,对领域迁移的鲁棒性和表示的一般性对成功至关重要。TRAM以最小的额外计算量建立了可泛化模型训练的新标准。