This study investigates the evaluation of multimedia quality models, focusing on the inherent uncertainties in subjective Mean Opinion Score (MOS) ratings due to factors like rater inconsistency and bias. Traditional statistical measures such as Pearson's Correlation Coefficient (PCC), Spearman's Rank Correlation Coefficient (SRCC), and Kendall's Tau (KTAU) often fail to account for these uncertainties, leading to inaccuracies in model performance assessment. We introduce the Constrained Concordance Index (CCI), a novel metric designed to overcome the limitations of existing metrics by considering the statistical significance of MOS differences and excluding comparisons where MOS confidence intervals overlap. Through comprehensive experiments across various domains including speech and image quality assessment, we demonstrate that CCI provides a more robust and accurate evaluation of instrumental quality models, especially in scenarios of low sample sizes, rater group variability, and restriction of range. Our findings suggest that incorporating rater subjectivity and focusing on statistically significant pairs can significantly enhance the evaluation framework for multimedia quality prediction models. This work not only sheds light on the overlooked aspects of subjective rating uncertainties but also proposes a methodological advancement for more reliable and accurate quality model evaluation.
翻译:本研究探讨多媒体质量模型的评估问题,重点关注主观平均意见得分(MOS)评级中固有的不确定性,这些不确定性源于评分者不一致性和偏见等因素。传统的统计度量如皮尔逊相关系数(PCC)、斯皮尔曼等级相关系数(SRCC)和肯德尔Tau系数(KTAU)往往未能考虑这些不确定性,导致模型性能评估不准确。我们引入了约束一致性指数(CCI),这是一种新颖的度量指标,旨在通过考虑MOS差异的统计显著性并排除MOS置信区间重叠的比较,克服现有指标的局限性。通过在语音和图像质量评估等多个领域进行的全面实验,我们证明CCI为仪器质量模型提供了更稳健和准确的评估,特别是在样本量小、评分者群体变异性和范围受限的情况下。我们的研究结果表明,纳入评分者主观性并聚焦于统计显著的对,可以显著增强多媒体质量预测模型的评估框架。这项工作不仅揭示了主观评分不确定性中被忽视的方面,还为更可靠和准确的质量模型评估提出了方法论上的进展。