Large pre-trained language models have demonstrated their proficiency in storing factual knowledge within their parameters and achieving remarkable results when fine-tuned for downstream natural language processing tasks. Nonetheless, their capacity to access and manipulate knowledge with precision remains constrained, resulting in performance disparities on knowledge-intensive tasks when compared to task-specific architectures. Additionally, the challenges of providing provenance for model decisions and maintaining up-to-date world knowledge persist as open research frontiers. To address these limitations, the integration of pre-trained models with differentiable access mechanisms to explicit non-parametric memory emerges as a promising solution. This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources, including external knowledge bases and search engines. While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules to augment their contextual processing capabilities, departing from the conventional language modeling paradigm. Through an exploration of current advancements in augmenting large language models with knowledge, this work concludes that this emerging research direction holds the potential to address prevalent issues in traditional LMs, such as hallucinations, un-grounded responses, and scalability challenges.
翻译:大型预训练语言模型通过在参数中存储事实知识并在下游自然语言处理任务中微调后取得显著成果,已展现出其能力。然而,它们精确访问和操作知识的能力仍受限,导致在知识密集型任务中与特定任务架构相比存在性能差距。此外,为模型决策提供来源和保持世界知识更新的挑战仍是开放性的研究前沿。为解决这些局限,将预训练模型与可微访问机制集成至显式非参数记忆库中,成为一种有前景的解决方案。本综述深入探讨了具备接入外部知识来源(包括外部知识库和搜索引擎)能力的增强型语言模型领域。在坚持标准缺失词预测目标的同时,这些增强型语言模型利用多样化的(可能为非参数的)外部模块来增强其上下文处理能力,从而偏离了传统语言建模范式。通过探索当前在将外部知识增强至大语言模型方面的进展,本文得出结论:这一新兴研究方向有望解决传统语言模型中的常见问题,如幻觉、缺乏依据的响应及可扩展性挑战。