Relational databases store much of the world's structured information, and they are essential for driving complex predictive applications. However, deep learning progress on relational data remains limited, as conventional approaches flatten databases into single tables via manual feature engineering, discarding relational context. Relational deep learning (RDL) addresses this by modeling databases as relational entity graphs (REGs) for graph neural networks (GNNs), but remains task- and database-specific. To combine the strengths of both paradigms, we propose a hybrid architecture combining a fine-tuned BART encoder to capture intra-row semantics with a GraphSAGE-based GNN over REGs to inject relational context. Experiments on RelBench show that the GNN substantially enriches BART's row embeddings, achieving a ROC-AUC of 67.40 on the driver-dnf task from the rel-f1 dataset. This performance is competitive with supervised baselines such as LightGBM (68.86) and narrows the gap to RDL (72.62) to within 5.22 points, though a substantial gap remains to state-of-the-art foundation models such as KumoRFM (82.63). These results suggest that lightweight hybrid LM-GNN architectures offer a promising and resource-efficient path towards foundation models for relational databases.
翻译:关系数据库存储了全球大部分结构化信息,是驱动复杂预测应用的关键。然而,深度学习在关系数据上的进展仍然有限,因为传统方法通过手动特征工程将数据库扁平化为单一表格,丢弃了关系上下文。关系深度学习通过将数据库建模为关系实体图,用于图神经网络,解决了这一问题,但仍局限于特定任务和数据库。为融合两种范式的优势,我们提出一种混合架构,结合微调后的BART编码器捕获行内语义,以及基于GraphSAGE的GNN在关系实体图上注入关系上下文。在RelBench上的实验表明,GNN显著丰富了BART的行嵌入,在rel-f1数据集的driver-dnf任务上实现了67.40的ROC-AUC。该性能与LightGBM(68.86)等监督基线相当,并将与关系深度学习的差距缩小至5.22分以内,但仍与KumoRFM(82.63)等最先进的基座模型存在显著差距。这些结果表明,轻量级混合LM-GNN架构为构建关系数据库基座模型提供了一条有前景且资源高效的路径。