Though large language models (LLMs) have demonstrated exceptional performance across numerous problems, their application to predictive tasks in relational databases remains largely unexplored. In this work, we address the notion that LLMs cannot yield satisfactory results on relational databases due to their interconnected tables, complex relationships, and heterogeneous data types. Using the recently introduced RelBench benchmark, we demonstrate that even a straightforward application of LLMs achieves competitive performance on these tasks. These findings establish LLMs as a promising new baseline for ML on relational databases and encourage further research in this direction.
翻译:尽管大语言模型(LLMs)已在众多问题上展现出卓越性能,但其在关系数据库预测任务中的应用仍基本处于未开发状态。本研究针对“由于关系数据库中存在相互关联的表、复杂的关系以及异构数据类型,LLMs无法在其上取得满意结果”这一观点展开探讨。通过使用最新提出的RelBench基准测试,我们证明即使直接应用LLMs也能在这些任务上获得具有竞争力的性能。这些发现确立了LLMs作为关系数据库机器学习研究的一个有前景的新基准,并鼓励该方向的进一步探索。