Generative AI tools often answer questions using source documents, e.g., through retrieval augmented generation. Current groundedness and hallucination evaluations largely frame the relationship between an answer and its sources as binary (the answer is either supported or unsupported). However, this obscures both the syntactic moves (e.g., direct quotation vs. paraphrase) and the interpretive moves (e.g., induction vs. deduction) performed when models reformulate evidence into an answer. This limits both benchmarking and user-facing provenance interfaces. We propose the development of a reader-centred taxonomy of grounding as a set of support relations between generated statements and source documents. We explain how this might be synthesised from prior research in linguistics and philosophy of language, and evaluated through a benchmark and human annotation protocol. Such a framework would enable interfaces that communicate not just whether a claim is grounded, but how.
翻译:生成式AI工具常通过检索增强生成等方式利用源文档作答。当前关于根基性与幻觉的评估多将答案与源文档的关系框架化为二元关系(答案要么有支撑,要么无支撑)。然而,这种处理方式掩盖了模型将证据重构为答案时所执行的句法操作(如直接引用与释义)和解释性操作(如归纳与演绎)。这既制约了基准测试的发展,也限制了面向用户的溯源界面设计。我们提出构建以读者为中心的分类体系,将"根基性"定义为生成陈述与源文档之间的一组支撑关系。本文阐释了如何从语言学与语言哲学的前期研究中综合提炼该体系,并通过基准测试与人工标注协议进行评估。这一框架将支持界面不仅传达声明是否具有根基性,还能明确其具体实现方式。