We present the Charles University system for the MRL~2023 Shared Task on Multi-lingual Multi-task Information Retrieval. The goal of the shared task was to develop systems for named entity recognition and question answering in several under-represented languages. Our solutions to both subtasks rely on the translate-test approach. We first translate the unlabeled examples into English using a multilingual machine translation model. Then, we run inference on the translated data using a strong task-specific model. Finally, we project the labeled data back into the original language. To keep the inferred tags on the correct positions in the original language, we propose a method based on scoring the candidate positions using a label-sensitive translation model. In both settings, we experiment with finetuning the classification models on the translated data. However, due to a domain mismatch between the development data and the shared task validation and test sets, the finetuned models could not outperform our baselines.
翻译:我们介绍了查尔斯大学为MRL~2023多语言多任务信息检索共享任务开发的系统。该共享任务的目标是为若干低资源语言开发命名实体识别和问答系统。我们对两个子任务的解决方案均采用翻译-测试方法。首先,使用多语言机器翻译模型将未标注样本翻译为英语;然后,利用强任务特定模型对翻译后的数据进行推理;最后,将标注结果映射回原始语言。为保持推断标签在原始语言中的正确位置,我们提出了一种基于标签敏感翻译模型对候选位置进行评分的方法。在两种设置下,我们尝试在翻译后的数据上微调分类模型。然而,由于开发数据与共享任务验证集及测试集之间存在领域不匹配,微调模型的性能未能超越基线模型。