The unwavering disparity in labeled resources between resource-rich languages and those considered low-resource remains a significant impediment for Large Language Models (LLMs). Recent strides in cross-lingual in-context learning (X-ICL), mainly through semantically aligned examples retrieved from multilingual pre-trained transformers, have shown promise in mitigating this issue. However, our investigation reveals that LLMs intrinsically reward in-language semantically aligned cross-lingual instances over direct cross-lingual semantic alignments, with a pronounced disparity in handling time-sensitive queries in the X-ICL setup. Such queries demand sound temporal reasoning ability from LLMs, yet the advancements have predominantly focused on English. This study aims to bridge this gap by improving temporal reasoning capabilities in low-resource languages. To this end, we introduce mTEMPREASON a temporal reasoning dataset aimed at the varied degrees of low-resource languages and propose Cross-Lingual Time-Sensitive Semantic Alignment (CLiTSSA), a novel method to improve temporal reasoning in these contexts. To facilitate this, we construct an extension of mTEMPREASON comprising pairs of parallel cross-language temporal queries along with their anticipated in-language semantic similarity scores. Our empirical evidence underscores the superior performance of CLiTSSA compared to established baselines across three languages - Romanian, German, and French, encompassing three temporal tasks and including a diverse set of four contemporaneous LLMs. This marks a significant step forward in addressing resource disparity in the context of temporal reasoning across languages.
翻译:资源丰富语言与低资源语言之间在标注资源上的持续差异,仍然是大型语言模型(LLMs)面临的一个重大障碍。最近,通过从多语言预训练Transformer中检索语义对齐示例而实现的跨语言上下文学习(X-ICL)取得了进展,显示出缓解这一问题的潜力。然而,我们的研究发现,LLMs本质上更倾向于奖励同语言内语义对齐的跨语言实例,而非直接的跨语言语义对齐,并且在处理X-ICL设置中的时间敏感查询时表现出明显的差异。此类查询要求LLMs具备良好的时序推理能力,然而相关进展主要集中在英语上。本研究旨在通过提升低资源语言的时序推理能力来弥合这一差距。为此,我们引入了mTEMPREASON——一个针对不同程度低资源语言的时序推理数据集,并提出了跨语言时间敏感语义对齐(CLiTSSA)这一新颖方法,以改善这些语境下的时序推理。为支持此研究,我们构建了mTEMPREASON的扩展版本,其中包含成对的平行跨语言时序查询及其预期的同语言语义相似度分数。我们的实证证据表明,在罗马尼亚语、德语和法语三种语言上,涵盖三项时序任务并包含四种当代LLMs的实验中,CLiTSSA的性能均优于现有基线方法。这标志着在解决跨语言时序推理领域的资源不均衡问题上迈出了重要一步。