LLMs' sources of knowledge are data snapshots containing factual information about entities collected at different timestamps and from different media types (e.g. wikis, social media, etc.). Such unstructured knowledge is subject to change due to updates through time from past to present. Equally important are the inconsistencies and inaccuracies occurring in different information sources. Consequently, the model's knowledge about an entity may be perturbed while training over the sequence of snapshots or at inference time, resulting in inconsistent and inaccurate model performance. In this work, we study the appropriateness of Large Language Models (LLMs) as repositories of factual knowledge. We consider twenty-four state-of-the-art LLMs that are either closed-, partially (weights), or fully (weight and training data) open-source. We evaluate their reliability in responding to time-sensitive factual questions in terms of accuracy and consistency when prompts are perturbed. We further evaluate the effectiveness of state-of-the-art methods to improve LLMs' accuracy and consistency. We then propose ENtity-Aware Fine-tuning (ENAF), a soft neurosymbolic approach aimed at providing structured representation of entities during fine-tuning to reduce inconsistencies and improve response stability under prompt variations.
翻译:大型语言模型的知识来源是包含实体事实信息的数据快照,这些快照在不同时间戳从不同媒体类型(如维基百科、社交媒体等)收集而来。此类非结构化知识会因从过去到现在的时序更新而发生改变。同样重要的是不同信息源中存在的不一致性与不准确性。因此,模型在序列化快照训练或推理过程中对实体的认知可能产生扰动,导致模型性能出现不一致与不准确。本研究探讨了大型语言模型作为事实知识库的适用性。我们选取了二十四种当前最先进的大型语言模型,涵盖封闭型、部分开源(权重)及完全开源(权重与训练数据)三种类型。通过扰动提示词,我们评估了这些模型在回答时效性事实问题时的准确性与一致性。进一步评估了现有前沿方法对提升模型准确性与一致性的有效性。在此基础上,我们提出ENtity-Aware Fine-tuning(ENAF)——一种软神经符号方法,旨在微调过程中提供实体的结构化表示,以降低不一致性并提升提示词变化下的响应稳定性。