Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims.
翻译:语言模型(LMs)已在理解和生成自然语言及形式语言方面展现出卓越能力。然而,这些进展在对接大型知识库等真实环境时仍存在不足,这影响了语义解析等应用,并导致生成"幻觉"信息的问题。本文通过实验研究,旨在揭示语言模型在知识库问答任务中面临的鲁棒性挑战。研究涵盖了训练与推理阶段数据分布不一致的场景,包括泛化至未见领域、适应不同语言变体以及跨数据集的可迁移性。我们的综合实验表明,即使采用所提出的数据增强技术,先进的小型和大型语言模型在多维度上仍表现欠佳。虽然语言模型是一项前景广阔的技术,但受数据分布问题影响,其当前形态在复杂环境中的鲁棒性较为脆弱且实用性有限。这为未来的数据收集和语言模型学习范式研究指明了方向。