It is widely known that hallucination is a critical issue in Simultaneous Machine Translation (SiMT) due to the absence of source-side information. While many efforts have been made to enhance performance for SiMT, few of them attempt to understand and analyze hallucination in SiMT. Therefore, we conduct a comprehensive analysis of hallucination in SiMT from two perspectives: understanding the distribution of hallucination words and the target-side context usage of them. Intensive experiments demonstrate some valuable findings and particularly show that it is possible to alleviate hallucination by decreasing the over usage of target-side information for SiMT.
翻译:众所周知,由于源端信息缺失,幻觉现象是同声机器翻译(SiMT)中的一个关键问题。尽管已有许多研究致力于提升SiMT性能,但鲜有尝试理解和分析SiMT中的幻觉现象。因此,我们从两个角度对SiMT中的幻觉现象进行了全面分析:理解幻觉词的分布规律及其对目标端上下文的利用模式。大量实验揭示了一些有价值的发现,尤其表明通过减少SiMT对目标端信息的过度使用,可以缓解幻觉现象。