One of the limitations of large language models is that they do not have access to up-to-date, proprietary or personal data. As a result, there are multiple efforts to extend language models with techniques for accessing external data. In that sense, LLMs share the vision of data integration systems whose goal is to provide seamless access to a large collection of heterogeneous data sources. While the details and the techniques of LLMs differ greatly from those of data integration, this paper shows that some of the lessons learned from research on data integration can elucidate the research path we are conducting today on language models.
翻译:大型语言模型的局限性之一在于无法访问最新的、专有的或个人数据。因此,目前有多种努力方向,旨在通过访问外部数据的技术来扩展语言模型。从这个意义上说,大语言模型与数据集成系统的愿景不谋而合——后者致力于为大量异构数据源提供无缝访问。尽管大语言模型的具体实现和技术细节与数据集成存在显著差异,但本文表明,从数据集成研究中汲取的经验教训能够为我们当前在语言模型领域开展的研究提供启示。