Label projection, which involves obtaining translated labels and texts jointly, is essential for leveraging machine translation to facilitate cross-lingual transfer in structured prediction tasks. Prior research exploring label projection often compromise translation accuracy by favoring simplified label translation or relying solely on word-level alignments. In this paper, we introduce a novel label projection approach, CLaP, which translates text to the target language and performs contextual translation on the labels using the translated text as the context, ensuring better accuracy for the translated labels. We leverage instruction-tuned language models with multilingual capabilities as our contextual translator, imposing the constraint of the presence of translated labels in the translated text via instructions. We benchmark CLaP with other label projection techniques on zero-shot cross-lingual transfer across 39 languages on two representative structured prediction tasks - event argument extraction (EAE) and named entity recognition (NER), showing over 2.4 F1 improvement for EAE and 1.4 F1 improvement for NER. We further explore the applicability of CLaP on ten extremely low-resource languages to showcase its potential for cross-lingual structured prediction.
翻译:标签投影是指联合获取翻译后的标签与文本,对于利用机器翻译促进结构化预测任务中的跨语言迁移至关重要。先前关于标签投影的研究常因偏好简化标签翻译或仅依赖词级对齐而牺牲翻译精度。本文提出了一种新颖的标签投影方法CLaP,该方法先将文本翻译为目标语言,再以翻译后的文本作为上下文对标签进行语境化翻译,从而确保翻译标签具有更高精度。我们利用具备多语言能力的指令微调语言模型作为上下文翻译器,通过指令约束确保翻译标签存在于翻译文本中。我们在涵盖39种语言的零样本跨语言迁移场景下,针对事件论元抽取(EAE)和命名实体识别(NER)两项代表性结构化预测任务,将CLaP与其他标签投影技术进行基准测试,结果显示EAE任务F1值提升超过2.4,NER任务F1值提升超过1.4。我们还进一步探索了CLaP在十种极低资源语言上的适用性,以展示其在跨语言结构化预测中的潜力。