基于DPO调优的大语言模型在同步语音翻译中的分段研究 (DPO-Tuned Large Language Models for Segmentation in Simultaneous Speech Translation)

Simultaneous speech translation requires accurate segmentation to balance translation quality and latency. Recent studies such as SHAS have introduced pretrained segmentation models, achieving stronger performance than heuristic rules. However, segmentation models such as SHAS, though pretrained and more robust than heuristic methods, are still constrained by supervised learning objectives and do not incorporate human preference alignment, which is crucial for natural real-time interpretation. In this work, we propose a segmentation framework based on large language models (LLMs) trained with Direct Preference Optimization (DPO). By leveraging preference alignment, our method enables LLMs to predict natural segmentation points that better meet the demands of real-time translation. We evaluate the system on the ACL 60/60 corpus across three language pairs (English-Japanese, Chinese, German), using SeamlessM4T v2 as the translation backbone. Experimental results show that our DPO-tuned LLM achieves higher segmentation accuracy than SHAS and yields consistent improvements in translation quality (BLEU, COMET) as well as latency (Average Lagging). Furthermore, our system benefits from IWSLT baselines for direct comparison. These findings highlight the potential of preference-tuned LLMs to surpass existing pretrained segmentation models and advance adaptive, human-aligned simultaneous interpretation.

翻译：同步语音翻译需要精确的分段以平衡翻译质量与延迟。近期研究如SHAS引入了预训练分段模型，其性能优于启发式规则。然而，SHAS等分段模型虽经预训练且比启发式方法更稳健，仍受限于监督学习目标，未融入对人类偏好的对齐——这对自然的实时口译至关重要。本研究提出一种基于大语言模型的分段框架，该模型通过直接偏好优化进行训练。借助偏好对齐机制，我们的方法使大语言模型能够预测更符合实时翻译需求的自然分段点。我们在ACL 60/60语料库上针对三个语言对（英语-日语、汉语、德语）评估系统性能，以SeamlessM4T v2作为翻译主干。实验结果表明：经DPO调优的大语言模型比SHAS获得更高的分段准确率，并在翻译质量（BLEU、COMET）与延迟（平均滞后）指标上实现持续提升。此外，该系统可利用IWSLT基线进行直接比较。这些发现凸显了偏好调优的大语言模型超越现有预训练分段模型、推动自适应且符合人类需求的同步口译发展的潜力。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日