Relational data stored in RDBMS is foundational to many real-world applications across domains such as e-commerce, finance, and sociality. While deep neural networks (DNNs) have achieved strong performance on tabular data with a single table, extending these models to relational databases is challenging due to the normalized multi-table structure and complex inter-table relationships. Existing approaches often rely strictly on schema-defined graphs, which overlook implicit semantic signals embedded in tuple attributes and suffer from rigid connectivity. In this work, we propose Retrieval-Augmented Modeling (RAM), a novel framework that combines graph structure with attribute semantics for relational data analytics. RAM treats tuple attributes as tokens and uses random walks to construct contextual documents, enabling the use of information retrieval techniques to estimate semantic relevance between tuples. Building on these documents, we introduce two retrieval-based augmentations: ATRA, which leverages intra-table relevance for contrastive learning, and ETRA, which links semantically related tuples across tables to enhance graph connectivity. Then, we propose a layer-wise model architecture tailored for relational data, which involves attribute embedding, feature integration, and graph aggregation layers to enable expressive and flexible representation learning. Extensive experiments on five real-world relational databases demonstrate that RAM consistently outperforms existing baselines in diverse prediction tasks, establishing a state-of-the-art for relational data analytics.
翻译:存储在关系数据库管理系统(RDBMS)中的关系数据是电子商务、金融和社交等领域众多实际应用的基础。尽管深度神经网络(DNN)在单表表格数据上取得了优异性能,但由于规范化的多表结构和复杂的跨表关系,将这些模型扩展到关系数据库仍面临挑战。现有方法通常严格依赖模式定义的图结构,忽视了嵌入在元组属性中的隐式语义信号,并受限于僵化的连接性。本文提出检索增强建模(RAM),一种将图结构与属性语义相结合用于关系数据分析的新型框架。RAM将元组属性视为标记,通过随机游走构建上下文文档,从而利用信息检索技术估计元组间的语义相关性。基于这些文档,我们引入两种检索增强方法:ATRA利用表内相关性进行对比学习,ETRA连接跨表中语义相关的元组以增强图连通性。随后,我们提出专为关系数据设计的逐层模型架构,包含属性嵌入、特征整合和图聚合层,以实现富有表现力且灵活的表示学习。在五个真实关系数据库上的大量实验表明,RAM在各种预测任务中始终优于现有基线,为关系数据分析建立了最新最优水平。