The unwavering disparity in labeled resources between resource-rich languages and those considered low-resource remains a significant impediment for Large Language Models (LLMs). Recent strides in cross-lingual in-context learning (X-ICL), mainly through semantically aligned examples retrieved from multilingual pre-trained transformers, have shown promise in mitigating this issue. However, our investigation reveals that LLMs intrinsically reward in-language semantically aligned cross-lingual instances over direct cross-lingual semantic alignments, with a pronounced disparity in handling time-sensitive queries in the X-ICL setup. Such queries demand sound temporal reasoning ability from LLMs, yet the advancements have predominantly focused on English. This study aims to bridge this gap by improving temporal reasoning capabilities in low-resource languages. To this end, we introduce mTEMPREASON, a temporal reasoning dataset aimed at the varied degrees of low-resource languages and propose Cross-Lingual Time-Sensitive Semantic Alignment (CLiTSSA), a novel method to improve temporal reasoning in these contexts. To facilitate this, we construct an extension of mTEMPREASON comprising pairs of parallel cross-language temporal queries along with their anticipated in-language semantic similarity scores. Our empirical evidence underscores the superior performance of CLiTSSA compared to established baselines across three languages -- Romanian, German, and French, encompassing three temporal tasks and including a diverse set of four contemporaneous LLMs. This marks a significant step forward in addressing resource disparity in the context of temporal reasoning across languages.
翻译:资源丰富语言与低资源语言之间标注资源的不平衡,始终是大型语言模型(LLMs)面临的一个重大障碍。近期,通过从多语言预训练Transformer中检索语义对齐示例而实现的跨语言上下文学习(X-ICL)取得了进展,显示出缓解这一问题的潜力。然而,我们的研究发现,LLMs本质上更倾向于奖励语言内语义对齐的跨语言实例,而非直接的跨语言语义对齐,并且在处理X-ICL设置中的时间敏感查询时表现出显著差异。此类查询要求LLMs具备良好的时序推理能力,然而相关进展主要集中在英语上。本研究旨在通过提升低资源语言的时序推理能力来弥合这一差距。为此,我们引入了mTEMPREASON,一个针对不同程度低资源语言的时序推理数据集,并提出了跨语言时间敏感语义对齐(CLiTSSA)这一新方法,以改善这些语境下的时序推理。为支持此研究,我们构建了mTEMPREASON的扩展集,其中包含成对的平行跨语言时序查询及其预期的语言内语义相似度分数。我们的实证证据表明,在三种语言——罗马尼亚语、德语和法语——上,涵盖三项时序任务并包含一组多样化的四种当代LLMs,CLiTSSA的性能均优于现有基线方法。这标志着在解决跨语言时序推理中的资源不平衡问题上迈出了重要一步。