Cross-lingual semantic textual relatedness task is an important research task that addresses challenges in cross-lingual communication and text understanding. It helps establish semantic connections between different languages, crucial for downstream tasks like machine translation, multilingual information retrieval, and cross-lingual text understanding.Based on extensive comparative experiments, we choose the XLM-R-base as our base model and use pre-trained sentence representations based on whitening to reduce anisotropy.Additionally, for the given training data, we design a delicate data filtering method to alleviate the curse of multilingualism. With our approach, we achieve a 2nd score in Spanish, a 3rd in Indonesian, and multiple entries in the top ten results in the competition's track C. We further do a comprehensive analysis to inspire future research aimed at improving performance on cross-lingual tasks.
翻译:跨语言语义文本相关性任务是一项重要的研究任务,旨在应对跨语言交流与文本理解中的挑战。它有助于在不同语言间建立语义联系,这对于机器翻译、多语言信息检索和跨语言文本理解等下游任务至关重要。基于广泛的对比实验,我们选择XLM-R-base作为基础模型,并采用基于白化的预训练句子表示来降低各向异性。此外,针对给定的训练数据,我们设计了一种精细的数据过滤方法以缓解多语言诅咒。通过我们的方法,我们在西班牙语中取得了第二名,在印度尼西亚语中取得了第三名,并在竞赛的C赛道中获得了多项前十名的成绩。我们进一步进行了全面分析,以期为未来提升跨语言任务性能的研究提供启示。