Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +39.9 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.
翻译:标签投影是一种有效的跨语言迁移技术,可将跨度标注数据集从高资源语言扩展至低资源语言。现有方法大多将标签投影作为机器翻译后的独立步骤执行,而先前结合两者的研究均报告了翻译质量下降的问题。我们通过LabelPigeon——一种利用XML标签联合执行翻译与标签投影的新型框架——重新评估了这一结论。我们设计了标签投影的直接评估方案,发现LabelPigeon在11种语言中均优于基线方法,并能主动提升翻译质量。进一步在203种语言和不同标注复杂度场景下的翻译质量评估表明,由额外微调带来的改进具有一致性。最终在27种语言和三项下游任务中,我们实现了相较于同类工作显著的跨语言迁移性能提升,在命名实体识别任务上F1值最高提升达39.9。总体而言,我们的研究结果表明:基于XML标签的标签投影技术能在不损害翻译质量的前提下,提供高效且有效的标签迁移方案。