Factual recall from a reference source is crucial for evaluating the performance of Retrieval Augmented Generation (RAG) systems, as it directly probes into the quality of both retrieval and generation. However, it still remains a challenge to perform this evaluation reliably and efficiently. Recent work has focused on fact verification via prompting language model (LM) evaluators, however we demonstrate that these methods are unreliable in the presence of incomplete or inaccurate information. We introduce Facts as a Function (FaaF), a new approach to fact verification that utilizes the function calling abilities of LMs and a framework for RAG factual recall evaluation. FaaF substantially improves the ability of LMs to identify unsupported facts in text with incomplete information whilst improving efficiency and lowering cost by several times, compared to prompt-based approaches.
翻译:从参考源中准确召回事实对于评估检索增强生成(RAG)系统的性能至关重要,因为这直接检验了检索与生成的质量。然而,可靠且高效地进行此类评估仍是一项挑战。近期研究集中于通过提示语言模型(LM)评估器进行事实核查,但我们证明,当存在不完整或不准确信息时,这些方法并不可靠。我们提出“事实即函数”(FaaF)——一种利用LM函数调用能力的事实核查新方法,以及一个用于RAG事实召回评估的框架。与基于提示的方法相比,FaaF在显著提升LM识别包含不完整信息文本中无支撑事实能力的同时,将效率提高了数倍并降低了成本。