The recent success of Automatic Speech Recognition (ASR) is largely attributed to the ever-growing amount of training data. However, this trend has made model training prohibitively costly and imposed computational demands. While data pruning has been proposed to mitigate this issue by identifying a small subset of relevant data, its application in ASR has been barely explored, and existing works often entail significant overhead to achieve meaningful results. To fill this gap, this paper presents the first investigation of dynamic data pruning for ASR, finding that we can reach the full-data performance by dynamically selecting 70% of data. Furthermore, we introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers several fine-grained pruning granularities specifically tailored for speech-related datasets, going beyond the conventional pruning of entire time sequences. Our intensive experiments show that DDP-ASR can save up to 1.6x training time with negligible performance loss.
翻译:近年来,自动语音识别(ASR)的成功在很大程度上归功于不断增长的训练数据量。然而,这一趋势使得模型训练成本急剧增加,并带来了巨大的计算需求。虽然数据剪枝技术通过识别相关数据的子集来缓解这一问题已被提出,但其在ASR中的应用仍鲜有探索,且现有方法往往需要显著的开销才能取得有意义的成果。为填补这一空白,本文首次研究了动态数据剪枝在ASR中的应用,发现通过动态选择70%的数据即可达到全数据训练的性能。此外,我们提出了面向ASR的动态数据剪枝方法(DDP-ASR),该方法提供了多种细粒度的剪枝粒度,专门针对语音相关数据集设计,超越了传统对整个时间序列进行剪枝的方式。我们的密集实验表明,DDP-ASR在性能损失可忽略不计的情况下,最高可节省1.6倍的训练时间。