MeVer at CheckThat! 2026: Cluster-Aware Hard-Negative Mining for Multilingual Scientific-Source Retrieval

Identifying the scientific source behind a social media claim requires matching short, informal, and often multilingual claims against large collections of scientific publications, where semantically related papers may act as challenging distractors or false negatives during training. We present our submission to CheckThat! 2026 Task 1 on multilingual scientific-source retrieval, focusing on how hard-negative mining should be adapted to multi-stage retrieval pipelines for scientific-source retrieval. We propose cluster-aware hard-negative mining strategies that exploit the semantic structure of retrieved candidate pools in order to construct more informative training negatives for dense retrieval and reranking. Our experiments show that different hard-negative structures induce different retrieval behaviors. Localized cluster negatives tend to favor precision-oriented retrieval, whereas broader non-gold semantic negatives provide stronger candidate coverage and more consistent reranking performance across languages. We further study multiple LLM-based evidence-selection formulations, including direct classification, pairwise comparison, and listwise reranking prompts, and find that constrained classification prompts provide the most reliable final document selection. The final system combines a dense retriever, a multilingual cross-encoder reranker, and a selective LLM-based disagreement resolver, ranking 6th among 37 submissions in the shared task evaluation. Overall, our results suggest that hard-negative mining should be treated as a stage-aware design problem rather than as a single retrieval optimization strategy.

翻译：识别社交媒体声明背后的科学来源，需要将简短、非正式且通常多语言的声明与大型科学出版物集合进行匹配，其中语义相关的论文可能在训练过程中充当具有挑战性的干扰项或假负例。我们提交至 CheckThat! 2026 任务 1（多语言科学来源检索）的方案，重点研究了硬负例挖掘应如何适配科学来源检索的多阶段检索流水线。我们提出了一种聚类感知的硬负例挖掘策略，该策略利用检索候选集的语义结构，为稠密检索和重排序构建更具信息量的训练负例。实验表明，不同的硬负例结构会引发不同的检索行为。局部聚类负例倾向于支持精确度导向的检索，而更广泛的非黄金语义负例则能提供更强的候选覆盖率和更稳定的跨语言重排序性能。我们进一步研究了多种基于 LLM 的证据选择形式，包括直接分类、成对比较和列表式重排序提示，结果发现受约束的分类提示能提供最可靠的最终文档选择。最终系统结合了稠密检索器、多语言交叉编码器重排序器以及选择性 LLM 分歧解决器，在共享任务评估中的 37 份提交中排名第 6。总体而言，我们的结果表明，硬负例挖掘应被视为一个阶段感知的设计问题，而非单一的检索优化策略。