In this paper, we study the application of Test-Time Training (TTT) as a solution to handling distribution shifts in speech applications. In particular, we introduce distribution-shifts to the test datasets of standard speech-classification tasks -- for example, speaker-identification and emotion-detection -- and explore how Test-Time Training (TTT) can help adjust to the distribution-shift. In our experiments that include distribution shifts due to background noise and natural variations in speech such as gender and age, we identify some key-challenges with TTT including sensitivity to optimization hyperparameters (e.g., number of optimization steps and subset of parameters chosen for TTT) and scalability (e.g., as each example gets its own set of parameters, TTT is not scalable). Finally, we propose using BitFit -- a parameter-efficient fine-tuning algorithm proposed for text applications that only considers the bias parameters for fine-tuning -- as a solution to the aforementioned challenges and demonstrate that it is consistently more stable than fine-tuning all the parameters of the model.
翻译:本文研究将测试时训练(TTT)作为处理语音应用中分布漂移问题的解决方案。我们专门为标准语音分类任务(如说话人识别和情感检测)的测试数据集引入分布漂移,并探究TTT如何帮助适应这种分布变化。在包含背景噪声以及语音自然变异(如性别与年龄差异)所导致分布漂移的实验中,我们识别出TTT面临的若干关键挑战,包括对优化超参数(如优化步数及TTT所选参数子集)的敏感性,以及可扩展性问题(例如:由于每个样本需要独立参数集,TTT难以扩展)。最后,我们提出采用BitFit——一种原本为文本应用设计的参数高效微调算法,仅针对偏置参数进行微调——来应对上述挑战,并证明该方法比全参数微调具有更一致稳定的性能。