We introduce RFC Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference free misinformation detection and comparison based diagnosis using paired original perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC Bench provides a structured testbed for studying reference free reasoning and advancing more reliable financial misinformation detection in real world settings.
翻译:我们提出了RFC Bench,这是一个用于评估大语言模型在真实新闻场景下处理金融虚假信息的基准。该基准在段落级别运行,能够捕捉金融新闻中意义由分散线索产生的上下文复杂性。基准定义了两个互补任务:无参考虚假信息检测以及基于配对原始-扰动输入的比较诊断。实验揭示了一致的模式:当存在比较性上下文时,模型性能显著更强,而无参考设置则暴露出重大缺陷,包括预测不稳定和无效输出增多。这些结果表明,当前模型在没有外部锚定的情况下难以维持连贯的信念状态。通过凸显这一差距,RFC Bench为研究无参考推理和推进现实场景中更可靠的金融虚假信息检测提供了一个结构化的测试平台。