We introduce Parallel Paraphrasing ($\text{Para}_\text{both}$), an augmentation method for translation metrics making use of automatic paraphrasing of both the reference and hypothesis. This method counteracts the typically misleading results of speech translation metrics such as WER, CER, and BLEU if only a single reference is available. We introduce two new datasets explicitly created to measure the quality of metrics intended to be applied to Swiss German speech-to-text systems. Based on these datasets, we show that we are able to significantly improve the correlation with human quality perception if our method is applied to commonly used metrics.
翻译:我们提出了并行释义法($\text{Para}_\text{both}$),这是一种利用参考译文和假设译文的自动释义来改进翻译指标的数据增强方法。该方法能有效抵消在仅有单一参考译文时,诸如WER(词错误率)、CER(字符错误率)和BLEU(双语评估替换指标)等语音翻译指标通常产生的误导性结果。我们引入了两个专门为衡量应用于瑞士德语语音转文本系统的指标质量而创建的新数据集。基于这些数据集,我们证明,若将本方法应用于常用指标,能够显著提升其与人类质量感知的相关性。