Open-domain generative systems have gained significant attention in the field of conversational AI (e.g., generative search engines). This paper presents a comprehensive review of the attribution mechanisms employed by these systems, particularly large language models. Though attribution or citation improve the factuality and verifiability, issues like ambiguous knowledge reservoirs, inherent biases, and the drawbacks of excessive attribution can hinder the effectiveness of these systems. The aim of this survey is to provide valuable insights for researchers, aiding in the refinement of attribution methodologies to enhance the reliability and veracity of responses generated by open-domain generative systems. We believe that this field is still in its early stages; hence, we maintain a repository to keep track of ongoing studies at https://github.com/HITsz-TMG/awesome-llm-attributions.
翻译:开放域生成系统(例如生成式搜索引擎)在对话式人工智能领域获得了显著关注。本文对这些系统(尤其是大语言模型)所采用的归因机制进行了全面综述。尽管归因或引用能够提升事实性和可验证性,但模糊的知识储备、固有偏见以及过度归因的弊端等问题可能会阻碍这些系统的有效性。本综述旨在为研究人员提供有价值的见解,助力改进归因方法,从而增强开放域生成系统生成回复的可靠性与真实性。我们认为该领域仍处于早期阶段,因此我们维护了一个资源库(https://github.com/HITsz-TMG/awesome-llm-attributions)以持续跟踪相关研究进展。