We present a Chain-of-Action (CoA) framework for multimodal and retrieval-augmented Question-Answering (QA). Compared to the literature, CoA overcomes two major challenges of current QA applications: (i) unfaithful hallucination that is inconsistent with real-time or domain facts and (ii) weak reasoning performance over compositional information. Our key contribution is a novel reasoning-retrieval mechanism that decomposes a complex question into a reasoning chain via systematic prompting and pre-designed actions. Methodologically, we propose three types of domain-adaptable `Plug-and-Play' actions for retrieving real-time information from heterogeneous sources. We also propose a multi-reference faith score (MRFS) to verify and resolve conflicts in the answers. Empirically, we exploit both public benchmarks and a Web3 case study to demonstrate the capability of CoA over other methods.
翻译:我们提出了一种名为Chain-of-Action(CoA)的框架,用于多模态和检索增强型问答(QA)。与现有文献相比,CoA克服了当前QA应用的两大挑战:(i)与实时或领域事实不一致的不忠实幻觉,以及(ii)对组合信息的弱推理性能。我们的关键贡献在于一种新颖的推理-检索机制,该机制通过系统化的提示和预定义动作将复杂问题分解为推理链。在方法论上,我们提出了三种可适应领域的“即插即用”动作,用于从异构来源检索实时信息。我们还提出了一种多参考忠实度评分(MRFS)来验证并解决答案中的冲突。在实证方面,我们利用公开基准测试和Web3案例研究,展示了CoA相较于其他方法的优越能力。