We show how a neural network can be trained on individual intrusive listening test scores to predict a distribution of scores for each pair of reference and coded input stereo or binaural signals. We nickname this method the Generative Machine Listener (GML), as it is capable of generating an arbitrary amount of simulated listening test data. Compared to a baseline system using regression over mean scores, we observe lower outlier ratios (OR) for the mean score predictions, and obtain easy access to the prediction of confidence intervals (CI). The introduction of data augmentation techniques from the image domain results in a significant increase in CI prediction accuracy as well as Pearson and Spearman rank correlation of mean scores.
翻译:我们展示了如何通过个体侵入式听音测试分数训练神经网络,以预测每对参考信号与编码输入立体声或双耳信号的分数分布。我们将此方法命名为生成式机器听者(GML),因其能够生成任意数量的模拟听音测试数据。与基于平均分数回归的基线系统相比,我们观察到平均分数预测的异常值比率(OR)更低,并能够便捷地获取置信区间(CI)的预测结果。引入源自图像领域的数据增强技术后,CI预测准确率以及平均分数的皮尔逊与斯皮尔曼秩相关系数均得到显著提升。