Judgmental forecasting is the task of making predictions about future events based on human judgment. This task can be seen as a form of claim verification, where the claim corresponds to a future event and the task is to assess the plausibility of that event. In this paper, we propose a novel multi-agent framework for claim verification, whereby different agents may disagree on claim veracity and bring specific evidence for and against the claims, represented as quantitative bipolar argumentation frameworks (QBAFs). We then instantiate the framework for supporting claim verification, with a variety of agents realised with Large Language Models (LLMs): (1) ArgLLM agents, an existing approach for claim verification that generates and evaluates QBAFs; (2) RbAM agents, whereby LLM-empowered Relation-based Argument Mining (RbAM) from external sources is used to generate QBAFs; (3) RAG-ArgLLM agents, extending ArgLLM agents with a form of Retrieval-Augmented Generation (RAG) of arguments from external sources. Finally, we conduct experiments with two standard judgmental forecasting datasets, with instances of our framework with two or three agents, empowered by six different base LLMs. We observe that combining evidence from agents can improve forecasting accuracy, especially in the case of three agents, while providing an explainable combination of evidence for claim verification.
翻译:判断性预测是基于人类判断对未来事件进行预测的任务。该任务可视为一种主张验证形式,其中主张对应于未来事件,而任务在于评估该事件的可能性。本文提出一种新颖的多智能体主张验证框架,不同智能体可能对主张真实性存在分歧,并为支持或反对主张提供具体证据,这些证据以定量双极论证框架(QBAFs)的形式表示。随后,我们通过多种基于大语言模型(LLMs)实现的智能体实例化该框架以支持主张验证:(1)ArgLLM智能体——一种通过生成和评估QBAFs进行主张验证的现有方法;(2)RbAM智能体——利用基于大语言模型的关系型论证挖掘(RbAM)从外部源生成QBAFs;(3)RAG-ArgLLM智能体——通过从外部源进行检索增强生成(RAG)论证的方式扩展ArgLLM智能体。最后,我们在两个标准判断性预测数据集上开展实验,采用由六种不同基础大语言模型驱动的二智能体与三智能体框架实例。实验结果表明,整合多智能体证据能提升预测准确率,特别是在三智能体场景下,同时为主张验证提供了可解释的证据组合机制。