As businesses, products, and services spring up around large language models, the trustworthiness of these models hinges on the verifiability of their outputs. However, methods for explaining language model outputs largely fall across two distinct fields of study which both use the term "attribution" to refer to entirely separate techniques: citation generation and training data attribution. In many modern applications, such as legal document generation and medical question answering, both types of attributions are important. In this work, we argue for and present a unified framework of large language model attributions. We show how existing methods of different types of attribution fall under the unified framework. We also use the framework to discuss real-world use cases where one or both types of attributions are required. We believe that this unified framework will guide the use case driven development of systems that leverage both types of attribution, as well as the standardization of their evaluation.
翻译:随着基于大语言模型的企业、产品和服务不断涌现,这些模型的可信度取决于其输出的可验证性。然而,解释语言模型输出的方法大致分属两个不同的研究领域,它们均使用"归因"一词指代完全不同的技术:引文生成和训练数据归因。在现代许多应用场景中,例如法律文书生成和医学问答,这两种类型的归因都至关重要。在本工作中,我们论证并提出一个统一的大语言模型归因框架。我们展示了现有不同类型的归因方法如何隶属于该统一框架。我们还利用该框架讨论了实际应用案例,其中需要一种或两种归因类型。我们相信,这一统一框架将指导利用两种归因的系统开发(基于具体用例驱动),并促进其评估标准的规范化。