Large language models (LLMs) show strong reasoning abilities across diverse tasks, yet their performance on extended contexts remains inconsistent. While prior research has emphasized mid-context degradation in question answering, this study examines the impact of context in LLM-based fact verification. Using three datasets (HOVER, FEVEROUS, and ClimateFEVER) and five open-source models accross different parameters sizes (7B, 32B and 70B parameters) and model families (Llama-3.1, Qwen2.5 and Qwen3), we evaluate both parametric factual knowledge and the impact of evidence placement across varying context lengths. We find that LLMs exhibit non-trivial parametric knowledge of factual claims and that their verification accuracy generally declines as context length increases. Similarly to what has been shown in previous works, in-context evidence placement plays a critical role with accuracy being consistently higher when relevant evidence appears near the beginning or end of the prompt and lower when placed mid-context. These results underscore the importance of prompt structure in retrieval-augmented fact-checking systems.
翻译:大型语言模型(LLMs)在多样化任务中展现出强大的推理能力,但其在扩展上下文上的表现仍不稳定。尽管先前研究已强调问答任务中的中段上下文性能退化现象,本研究则探讨了上下文在基于LLM的事实核查中的影响。通过使用三个数据集(HOVER、FEVEROUS和ClimateFEVER)以及五个不同参数量级(7B、32B和70B参数)和模型家族(Llama-3.1、Qwen2.5和Qwen3)的开源模型,我们评估了参数化事实知识以及证据位置在不同上下文长度下的影响。研究发现,LLMs对事实性主张具备显著的参数化知识,且其核查准确率通常随上下文长度增加而下降。与先前研究结论相似,上下文内证据的位置起着关键作用:当相关证据出现在提示的开头或结尾时,准确率始终较高;而当证据置于上下文中间时,准确率则较低。这些结果凸显了提示结构在检索增强事实核查系统中的重要性。