This paper proposes a framework to address the issue of data scarcity in Document-Grounded Dialogue Systems(DGDS). Our model leverages high-resource languages to enhance the capability of dialogue generation in low-resource languages. Specifically, We present a novel pipeline CLEM (Cross-Lingual Enhanced Model) including adversarial training retrieval (Retriever and Re-ranker), and Fid (fusion-in-decoder) generator. To further leverage high-resource language, we also propose an innovative architecture to conduct alignment across different languages with translated training. Extensive experiment results demonstrate the effectiveness of our model and we achieved 4th place in the DialDoc 2023 Competition. Therefore, CLEM can serve as a solution to resource scarcity in DGDS and provide useful guidance for multi-lingual alignment tasks.
翻译:本文提出了一种框架,旨在解决文档式对话系统(DGDS)中的数据稀缺问题。我们的模型利用高资源语言来提升低资源语言中对话生成的能力。具体而言,我们提出了一个新颖的流水线CLEM(跨语言增强模型),其中包含对抗训练检索(检索器与重排序器)以及Fid(融合解码器)生成器。为了进一步利用高资源语言,我们还提出了一种创新架构,通过翻译训练实现不同语言间的对齐。大量实验结果证明了我们模型的有效性,并在DialDoc 2023竞赛中获得第四名。因此,CLEM可作为DGDS中资源稀缺问题的解决方案,并为多语言对齐任务提供有益的指导。