Automatic factuality verification of large language model (LLM) generations is becoming more and more widely used to combat hallucinations. A major point of tension in the literature is the granularity of this fact-checking: larger chunks of text are hard to fact-check, but more atomic facts like propositions may lack context to interpret correctly. In this work, we assess the role of context in these atomic facts. We argue that fully atomic facts are not the right representation, and define two criteria for molecular facts: decontextuality, or how well they can stand alone, and minimality, or how little extra information is added to achieve decontexuality. We quantify the impact of decontextualization on minimality, then present a baseline methodology for generating molecular facts automatically, aiming to add the right amount of information. We compare against various methods of decontextualization and find that molecular facts balance minimality with fact verification accuracy in ambiguous settings.
翻译:大语言模型(LLM)生成内容的自动事实性核查正被日益广泛地用于对抗幻觉。当前文献中的一个主要争议点在于事实核查的粒度:较大的文本块难以进行事实核查,而诸如命题等更原子化的事实可能因缺乏语境而无法被正确解读。本研究评估了语境在这些原子事实中的作用。我们认为完全原子化的事实并非合适的表征形式,并定义了分子事实的两项标准:解语境性(即其独立成说的能力)与最小性(即为实现解语境性所添加的额外信息量)。我们量化了解语境化对最小性的影响,随后提出一种自动生成分子事实的基线方法,旨在添加适量的信息。通过与多种解语境化方法进行比较,我们发现分子事实能在模糊情境下平衡最小性与事实核查准确性。