Data augmentations are known to improve robustness in speech-processing tasks. In this study, we summarize and compare different data augmentation strategies using S3PRL toolkit. We explore how HuBERT and wav2vec perform using different augmentation techniques (SpecAugment, Gaussian Noise, Speed Perturbation) for Phoneme Recognition (PR) and Automatic Speech Recognition (ASR) tasks. We evaluate model performance in terms of phoneme error rate (PER) and word error rate (WER). From the experiments, we observed that SpecAugment slightly improves the performance of HuBERT and wav2vec on the original dataset. Also, we show that models trained using the Gaussian Noise and Speed Perturbation dataset are more robust when tested with augmented test sets.
翻译:数据增强已知能提升语音处理任务的鲁棒性。本研究利用S3PRL工具包归纳并比较了不同的数据增强策略。我们探究了HuBERT和wav2vec在使用不同增强技术(SpecAugment、高斯噪声、速度扰动)时,在音素识别(PR)和自动语音识别(ASR)任务中的表现。通过音素错误率(PER)和词错误率(WER)评估模型性能。实验发现,SpecAugment在原始数据集上略微提升了HuBERT和wav2vec的性能。此外,研究表明,使用高斯噪声和速度扰动数据集训练的模型在使用增强测试集进行测试时表现出更高的鲁棒性。