We introduce Dataset Concealment (DSC), a rigorous new procedure for evaluating and interpreting objective speech quality estimation models. DSC quantifies and decomposes the performance gap between research results and real-world application requirements, while offering context and additional insights into model behavior and dataset characteristics. We also show the benefits of addressing the corpus effect by using the dataset Aligner from AlignNet when training models with multiple datasets. We demonstrate DSC and the improvements from the Aligner using nine training datasets and nine unseen datasets with three well-studied models: MOSNet, NISQA, and a Wav2Vec2.0-based model. DSC provides interpretable views of the generalization capabilities and limitations of models, while allowing all available data to be used at training. An additional result is that adding the 1000 parameter dataset Aligner to the 94 million parameter Wav2Vec model during training does significantly improve the resulting model's ability to estimate speech quality for unseen data.
翻译:本文提出数据集隐藏(DSC)这一严谨的新流程,用于评估和解释客观语音质量估计模型。DSC量化并分解了研究成果与实际应用需求之间的性能差距,同时为模型行为与数据集特性提供背景信息与额外洞见。我们还展示了在利用多数据集训练模型时,通过AlignNet中的数据集对齐器处理语料库效应的优势。我们使用九个训练数据集和九个未见数据集,结合三种经过充分研究的模型——MOSNet、NISQA以及基于Wav2Vec2.0的模型,验证了DSC方法及对齐器带来的改进。DSC为模型的泛化能力与局限性提供了可解释的视角,同时允许在训练中使用全部可用数据。另一重要发现是:在训练过程中为包含9400万参数的Wav2Vec模型添加仅含1000参数的数据集对齐器,能显著提升所得模型对未见数据的语音质量估计能力。