Effective communication is fundamental to any interaction, yet challenges arise when participants do not share a common language. Automatic translation systems offer a powerful solution to bridge language barriers in such scenarios, but they introduce errors that can lead to misunderstandings and conversation breakdown. A key issue is that current systems fail to incorporate the rich contextual information necessary to resolve ambiguities and omitted details, resulting in literal, inappropriate, or misaligned translations. In this work, we present a framework to improve large language model-based translation systems by incorporating contextual information in bilingual conversational settings. During training, we leverage context-augmented parallel data, which allows the model to generate translations sensitive to conversational history. During inference, we perform quality-aware decoding with context-aware metrics to select the optimal translation from a pool of candidates. We validate both components of our framework on two task-oriented domains: customer chat and user-assistant interaction. Across both settings, our framework consistently results in better translations than state-of-the-art systems like GPT-4o and TowerInstruct, as measured by multiple automatic translation quality metrics on several language pairs. We also show that the resulting model leverages context in an intended and interpretable way, improving consistency between the conveyed message and the generated translations.
翻译:有效沟通是任何互动的基础,但当参与者没有共同语言时,挑战便随之产生。自动翻译系统为弥合此类场景中的语言障碍提供了强大的解决方案,但它们会引入可能导致误解和对话中断的错误。一个关键问题是,现有系统未能纳入解决歧义和补充遗漏细节所需的丰富上下文信息,导致翻译结果生硬、不当或语义错位。在本研究中,我们提出了一种框架,通过在双语对话场景中融入上下文信息来改进基于大语言模型的翻译系统。在训练阶段,我们利用上下文增强的平行数据,使模型能够生成对对话历史敏感的翻译。在推理阶段,我们采用基于上下文感知指标的质量感知解码方法,从候选翻译池中选择最优结果。我们在两个任务导向领域(客户聊天和用户-助手交互)上验证了框架的两个组成部分。在多种语言对的多个自动翻译质量指标评估中,我们的框架在两种场景下均持续产生优于GPT-4o和TowerInstruct等最先进系统的翻译结果。我们还证明,所得模型能够以符合预期且可解释的方式利用上下文,从而提升传达信息与生成翻译之间的一致性。