Integrating external knowledge into large language models (LLMs) presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory. Prior studies, however, have tended to over-reliance on external knowledge, underestimating the valuable contributions of an LLMs' intrinsic parametric knowledge. The efficacy of LLMs in blending external and parametric knowledge remains largely unexplored, especially in cases where external knowledge is incomplete and necessitates supplementation by their parametric knowledge. We propose to deconstruct knowledge fusion into four distinct scenarios, offering the first thorough investigation of LLM behavior across each. We develop a systematic pipeline for data construction and knowledge infusion to simulate these fusion scenarios, facilitating a series of controlled experiments. Our investigation reveals that enhancing parametric knowledge within LLMs can significantly bolster their capability for knowledge integration. Nonetheless, we identify persistent challenges in memorizing and eliciting parametric knowledge, and determining parametric knowledge boundaries. Our findings aim to steer future explorations on harmonizing external and parametric knowledge within LLMs.
翻译:将外部知识整合到大型语言模型(LLMs)中,为解决其过时且静态的参数记忆所带来的局限性提供了一种前景广阔的方案。然而,先前的研究往往过度依赖外部知识,低估了LLMs内在参数知识的宝贵贡献。LLMs在融合外部知识与参数知识方面的效能很大程度上仍未得到探索,尤其是在外部知识不完整且需要其参数知识进行补充的情况下。我们提出将知识融合解构为四种不同的场景,并首次对LLM在每种场景下的行为进行了全面研究。我们开发了一个用于数据构建与知识注入的系统化流程,以模拟这些融合场景,从而促进一系列受控实验。我们的研究表明,增强LLMs内部的参数知识可以显著提升其知识整合能力。尽管如此,我们发现在记忆与激发参数知识、以及确定参数知识边界方面仍存在持续挑战。我们的发现旨在为未来探索LLMs中外部知识与参数知识的协调统一提供方向。