Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. Previous works attempted to mitigate this issue by either filtering out the nosiest pseudo-labels or improving the overall quality of pseudo-labels. While these methods are effective to some extent, it is unrealistic to entirely eliminate incorrect tokens in pseudo-labels. In this work, we propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels from the perspective of the training objective. The framework comprises several components. Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens. Applying this loss function in pseudo-labeling requires detecting incorrect tokens in the predicted pseudo-labels. In this work, we adopt a confidence-based error detection method that identifies the incorrect tokens by comparing their confidence scores with a given threshold, thus necessitating the confidence score to be discriminative. Hence, the second proposed technique is the contrastive CTC loss function that widens the confidence gap between the correctly and incorrectly predicted tokens, thereby improving the error detection ability. Additionally, obtaining satisfactory performance with confidence-based error detection typically requires extensive threshold tuning. Instead, we propose an automatic thresholding method that uses labeled data as a proxy for determining the threshold, thus saving the pain of manual tuning.

翻译：当标注数据不足时，采用伪标签技术的半监督学习能显著提升自动语音识别性能。然而，伪标签往往含有噪声，大量错误词元混杂其中。将含噪标签视为损失函数中的真实标注会导致次优性能。先前研究尝试通过过滤高噪声伪标签或提升伪标签整体质量来缓解该问题。尽管这些方法在某种程度上有效，但完全消除伪标签中的错误词元仍不现实。本文提出一种名为"替代伪标签"的创新框架，从训练目标角度解决带噪伪标签问题。该框架包含多个组件：首先，引入广义CTC损失函数，通过接受错误词元位置的替代标记来处理带噪伪标签。在伪标签训练中应用该损失函数需要检测预测伪标签中的错误词元。本研究采用基于置信度的错误检测方法，通过比较置信度分数与给定阈值识别错误词元，这要求置信度分数具有判别性。因此，第二项技术是提出对比CTC损失函数，该函数通过扩大正确与错误预测词元之间的置信度差距，提升错误检测能力。此外，基于置信度的错误检测要达到理想性能通常需要大量阈值调参。为此，我们提出自动阈值方法，利用标注数据作为代理来确定阈值，避免了手动调参的繁琐过程。