The escalating challenge of misinformation, particularly in political discourse, requires advanced fact-checking solutions; this is even clearer in the more complex scenario of multimodal claims. We tackle this issue using a multimodal large language model in conjunction with retrieval-augmented generation (RAG), and introduce two novel reasoning techniques: Chain of RAG (CoRAG) and Tree of RAG (ToRAG). They fact-check multimodal claims by extracting both textual and image content, retrieving external information, and reasoning subsequent questions to be answered based on prior evidence. We achieve a weighted F1-score of 0.85, surpassing a baseline reasoning technique by 0.14 points. Human evaluation confirms that the vast majority of our generated fact-check explanations contain all information from gold standard data.
翻译:随着虚假信息(尤其在政治话语中)的挑战日益严峻,对先进事实核查解决方案的需求愈发迫切;这一需求在多模态声明的复杂情境下更为凸显。本研究采用多模态大语言模型结合检索增强生成技术应对该问题,并提出了两种新颖的推理方法:检索增强生成链与检索增强生成树。这些方法通过提取文本与图像内容、检索外部信息,并基于已有证据推理后续待解答的问题,实现对多模态声明的事实核查。实验取得了0.85的加权F1分数,较基线推理方法提升0.14分。人工评估证实,我们生成的事实核查解释绝大多数包含了黄金标准数据中的全部信息。