Automatic measures of similarity between utterances are invaluable for training speech synthesizers, evaluating machine translation, and assessing learner productions. While there exist measures for semantic similarity and prosodic similarity, there are as yet none for pragmatic similarity. To enable the training of such measures, we developed the first collection of human judgments of pragmatic similarity between utterance pairs. Each pair consisting of an utterance extracted from a recorded dialog and a re-enactment of that utterance. Re-enactments were done under various conditions designed to create a variety of degrees of similarity. Each pair was rated on a continuous scale by 6 to 9 judges. The average inter-judge correlation was as high as 0.72 for English and 0.66 for Spanish. We make this data available at https://github.com/divettemarco/PragSim .
翻译:话语间相似性的自动度量对于训练语音合成系统、评估机器翻译以及评测语言学习者产出具有重要价值。尽管目前已有语义相似性和韵律相似性的度量方法,但尚不存在针对语用相似性的度量标准。为支持此类度量方法的训练,我们构建了首个针对话语对之间语用相似性的人工判断数据集。每个话语对由一段录音对话中提取的话语及其重新演绎版本组成。重新演绎过程在不同条件下进行,以产生多种相似程度。每个话语对由6至9名评估者进行连续尺度评分。经统计,英语话语对的评估者间平均相关系数高达0.72,西班牙语为0.66。本数据集发布于https://github.com/divettemarco/PragSim。