This report presents our participation to the WSDM Cup 2026 shared task on multilingual document retrieval from English queries. The task provides a challenging benchmark for cross-lingual generalization. It also provides a natural testbed for evaluating SPLARE, our recently proposed learned sparse retrieval model, which produces generalizable sparse latent representations and is particularly well suited to multilingual retrieval settings. We evaluate five progressively enhanced runs, starting from a SPLARE-7B model and incorporating lightweight improvements, including reranking with Qwen3-Reranker-4B and simple score fusion strategies. Our results demonstrate the strength of SPLARE compared to state-of-the-art dense baselines such as Qwen3-8B-Embed. More broadly, our submission highlights the continued relevance and competitiveness of learned sparse retrieval models beyond English-centric scenarios.
翻译:本报告介绍了我们参与WSDM Cup 2026中“基于英文查询的多语言文档检索”共享任务的情况。该任务为跨语言泛化提供了一个具有挑战性的基准。它也为评估我们最近提出的学习型稀疏检索模型SPLARE提供了一个天然的测试平台,该模型能生成可泛化的稀疏潜在表示,尤其适用于多语言检索场景。我们评估了五个逐步增强的提交结果,从SPLARE-7B模型开始,并整合了轻量级改进,包括使用Qwen3-Reranker-4B进行重排序以及简单的分数融合策略。我们的结果表明,与Qwen3-8B-Embed等最先进的稠密基线模型相比,SPLARE具有优势。更广泛地说,我们的提交结果突显了学习型稀疏检索模型在超越以英语为中心的场景下,依然具有持续的相关性和竞争力。