Promoting Generalized Cross-lingual Question Answering in Few-resource Scenarios via Self-knowledge Distillation

Despite substantial progress in multilingual extractive Question Answering (QA), models with high and uniformly distributed performance across languages remain challenging, especially for languages with limited resources. We study cross-lingual transfer mainly focusing on the Generalized Cross-Lingual Transfer (G-XLT) task, where the question language differs from the context language - a challenge that has received limited attention thus far. Our approach seeks to enhance cross-lingual QA transfer using a high-performing multilingual model trained on a large-scale dataset, complemented by a few thousand aligned QA examples across languages. Our proposed strategy combines cross-lingual sampling and advanced self-distillation training in generations to tackle the previous challenge. Notably, we introduce the novel mAP@k coefficients to fine-tune self-knowledge distillation loss, dynamically regulating the teacher's model knowledge to perform a balanced and effective knowledge transfer. We extensively evaluate our approach to assess XLT and G-XLT capabilities in extractive QA. Results reveal that our self-knowledge distillation approach outperforms standard cross-entropy fine-tuning by a significant margin. Importantly, when compared to a strong baseline that leverages a sizeable volume of machine-translated data, our approach shows competitive results despite the considerable challenge of operating within resource-constrained settings, even in zero-shot scenarios. Beyond performance improvements, we offer valuable insights through comprehensive analyses and an ablation study, further substantiating the benefits and constraints of our approach. In essence, we propose a practical solution to improve cross-lingual QA transfer by leveraging a few data resources in an efficient way.

翻译：尽管多语言抽取式问答（QA）取得了显著进展，但在各语言间实现高且均匀分布的性能仍具挑战，尤其对于资源匮乏的语言。本研究聚焦于跨语言迁移，重点探索广义跨语言迁移（G-XLT）任务——即问题语言与上下文语言不同的场景，这一挑战此前受到的关注有限。我们的方法旨在利用在大规模数据集上训练的高性能多语言模型，辅以数千条跨语言对齐的问答示例，增强跨语言问答迁移能力。提出的策略结合了跨语言采样与生成式高级自蒸馏训练，以应对前述挑战。特别地，我们引入新颖的mAP@k系数对自知识蒸馏损失进行精细调整，动态调控教师模型的知识，实现均衡有效的知识迁移。我们全面评估了该方法在抽取式问答中XLT和G-XLT能力。结果表明，我们的自知识蒸馏方法显著优于标准交叉熵微调。更重要的是，与利用大量机器翻译数据的强基线方法相比，尽管在资源受限环境（甚至零样本场景）中面临巨大挑战，我们的方法仍展现出具有竞争力的结果。除性能提升外，我们通过全面分析和消融研究提供了宝贵见解，进一步验证了该方法优势与局限。本质上，我们提出了一种通过高效利用少量数据资源改进跨语言问答迁移的实用解决方案。