This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark (CRAG-MM), we developed a comprehensive framework that integrates tailored retrieval pipelines for different tasks with a unified LLM-tuning approach for hallucination control. Our solution features (1) domain-specific retrieval pipelines handling image-indexed knowledge graphs, web sources, and multi-turn conversations; and (2) advanced refusal training using SFT, DPO, and RL. The system achieved 2nd place in Task 1, 2nd place in Task 2, and 1st place in Task 3, securing the grand prize for excellence in ego-centric queries through superior handling of first-person perspective challenges.
翻译:本文介绍了db3团队在KDD Cup'25的Meta CRAG-MM挑战赛中获胜的解决方案。针对该挑战赛独特的跨模态多轮问答基准(CRAG-MM),我们开发了一个综合性框架,该框架将针对不同任务定制的检索流程与用于控制幻觉的统一大语言模型调优方法相结合。我们的解决方案具有以下特点:(1)处理图像索引知识图谱、网络来源和多轮对话的领域特定检索流程;(2)利用监督微调、直接偏好优化和强化学习进行高级拒绝训练。该系统在任务1中获得第2名,在任务2中获得第2名,在任务3中获得第1名,凭借对第一人称视角挑战的卓越处理能力,在以自我为中心的查询方面表现优异,从而赢得了总冠军。