The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both Task 1 (critical question generation) and Task 2 (retrieval-augmented trustworthiness reporting). Our approach combines LLM-based question generation with semantic filtering, diversity enforcement using clustering, and several query expansion strategies (including reasoning-based Chain-of-Thought expansion) to retrieve relevant evidence from the MS MARCO V2.1 segmented corpus. Retrieved documents are re-ranked using a monoT5 model and filtered using an LLM relevance judge together with a domain-level trustworthiness dataset. For Task 2, selected evidence is synthesized by an LLM into concise trustworthiness reports with citations. Results from the official evaluation indicate that Chain-of-Thought query expansion and re-ranking substantially improve both relevance and domain trust compared to baseline retrieval, while question-generation performance shows moderate quality with room for improvement. We conclude by outlining key challenges encountered and suggesting directions for enhancing robustness and trustworthiness assessment in future iterations of the system.
翻译:TREC 2025 的 DRAGUN 赛道旨在解决日益增长的对有效支持工具的需求,以帮助用户评估在线新闻的可信度。我们描述了提交给任务1(关键问题生成)和任务2(检索增强的可信度报告)的 UR_Trecking 系统。我们的方法结合了基于大语言模型的问题生成与语义过滤、基于聚类的多样性增强,以及多种查询扩展策略(包括基于推理的思维链扩展),以从 MS MARCO V2.1 分段语料库中检索相关证据。检索到的文档使用 monoT5 模型进行重排序,并利用大语言模型作为相关性评估器,结合领域级可信度数据集进行过滤。对于任务2,选定的证据由大语言模型综合成简洁的可信度报告并附有引用。官方评估结果表明,与基线检索相比,思维链查询扩展和重排序显著提升了相关性和领域可信度,而问题生成性能表现中等,仍有改进空间。最后,我们总结了遇到的主要挑战,并提出了在系统未来迭代中增强鲁棒性和可信度评估的方向。