Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities. However, their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data. Additionally, the lack of transparency of most state-of-the-art (SOTA) LLMs, which can only be accessed via APIs, impedes further fine-tuning with domain custom data. Moreover, providing private data to the LLMs' owner leads to data privacy problems. To address these challenges, we propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge without altering the LLMs' parameters. Our PKG is based on open-source "white-box" language models, allowing offline memory of any knowledge that LLMs require. We demonstrate that our PKG framework can enhance the performance of "black-box" LLMs on a range of domain knowledge-intensive tasks that require factual (+7.9%), tabular (+11.9%), medical (+3.0%), and multimodal (+8.1%) knowledge.
翻译:大型语言模型(LLMs)凭借其出色的语言理解与生成能力,显著推动了自然语言处理(NLP)领域的发展。然而,由于缺乏相关数据的充分接触,这类模型在需要专业知识的特定领域任务中可能表现欠佳。此外,当前多数最先进(SOTA)的大型语言模型仅能通过应用程序接口(API)访问,其透明性不足的特性阻碍了进一步利用领域定制数据进行微调。同时,将私有数据提供给模型所有者还会引发数据隐私问题。为应对这些挑战,我们提出新型参数知识引导(PKG)框架,该框架为大型语言模型配备知识引导模块,使其无需修改模型参数即可访问相关知识。我们的PKG基于开源"白盒"语言模型构建,可离线存储LLMs所需的任意知识。实验证明,PKG框架能有效提升"黑盒"LLMs在需要事实(+7.9%)、表格(+11.9%)、医学(+3.0%)和多模态(+8.1%)知识的多种领域知识密集型任务中的性能表现。