KITLM: Domain-Specific Knowledge InTegration into Language Models for Question Answering

Large language models (LLMs) have demonstrated remarkable performance in a wide range of natural language tasks. However, as these models continue to grow in size, they face significant challenges in terms of computational costs. Additionally, LLMs often lack efficient domain-specific understanding, which is particularly crucial in specialized fields such as aviation and healthcare. To boost the domain-specific understanding, we propose, KITLM, a novel knowledge base integration approach into language model through relevant information infusion. By integrating pertinent knowledge, not only the performance of the language model is greatly enhanced, but the model size requirement is also significantly reduced while achieving comparable performance. Our proposed knowledge-infused model surpasses the performance of both GPT-3.5-turbo and the state-of-the-art knowledge infusion method, SKILL, achieving over 1.5 times improvement in exact match scores on the MetaQA. KITLM showed a similar performance boost in the aviation domain with AeroQA. The drastic performance improvement of KITLM over the existing methods can be attributed to the infusion of relevant knowledge while mitigating noise. In addition, we release two curated datasets to accelerate knowledge infusion research in specialized fields: a) AeroQA, a new benchmark dataset designed for multi-hop question-answering within the aviation domain, and b) Aviation Corpus, a dataset constructed from unstructured text extracted from the National Transportation Safety Board reports. Our research contributes to advancing the field of domain-specific language understanding and showcases the potential of knowledge infusion techniques in improving the performance of language models on question-answering.

翻译：[翻译摘要] 大语言模型在广泛自然语言任务中展现出卓越性能。然而，随着模型规模持续增长，其在计算成本方面面临严峻挑战。此外，大语言模型往往缺乏高效的领域特定理解能力，这一问题在航空、医疗等专业领域尤为关键。为增强领域特定理解能力，我们提出KITLM——一种通过注入相关信息将知识库融入语言模型的新型方法。通过整合相关知识，该方法不仅显著提升了语言模型性能，更在实现同等表现水平的同时大幅降低了对模型尺寸的要求。我们提出的知识融合模型在MetaQA数据集上的精确匹配得分超过GPT-3.5-turbo及当前最优知识注入方法SKILL达1.5倍以上。在航空领域的AeroQA数据集上，KITLM同样展现出类似性能提升。相较于现有方法，KITLM性能的显著改善可归因于在抑制噪声的同时注入有效关联知识。此外，我们发布两个经精心整理的专用领域数据集以加速知识注入研究：a) AeroQA——面向航空领域多跳问答的新型基准数据集；b) Aviation Corpus——基于美国国家运输安全委员会报告非结构化文本构建的数据集。本研究推动了领域特定语言理解技术的发展，并展示了知识注入技术在提升语言模型问答性能方面的巨大潜力。