Relational databases (RDBs) play a crucial role in many real-world web applications, supporting data management across multiple interconnected tables. Beyond typical retrieval-oriented tasks, prediction tasks on RDBs have recently gained attention. In this work, we address this problem by generating informative relational features that enhance predictive performance. However, generating such features is challenging: it requires reasoning over complex schemas and exploring a combinatorially large feature space, all without explicit supervision. To address these challenges, we propose ReFuGe, an agentic framework that leverages specialized large language model agents: (1) a schema selection agent identifies the tables and columns relevant to the task, (2) a feature generation agent produces diverse candidate features from the selected schema, and (3) a feature filtering agent evaluates and retains promising features through reasoning-based and validation-based filtering. It operates within an iterative feedback loop until performance converges. Experiments on RDB benchmarks demonstrate that ReFuGe substantially improves performance on various RDB prediction tasks. Our code and datasets are available at https://github.com/K-Kyungho/REFUGE.
翻译:关系数据库(RDB)在众多现实世界网络应用中扮演着关键角色,支持跨多个互连表的数据管理。除典型的检索导向任务外,针对关系数据库的预测任务近期备受关注。本研究通过生成信息丰富的关联特征以提升预测性能来解决此问题。然而,此类特征的生成面临挑战:需要在复杂数据库模式上进行推理,探索组合爆炸的特征空间,且缺乏显式监督。为应对这些挑战,我们提出ReFuGe——一个基于代理的框架,其利用专用大语言模型代理:(1)模式选择代理识别与任务相关的数据表及列;(2)特征生成代理从选定模式中生成多样化的候选特征;(3)特征过滤代理通过基于推理和基于验证的筛选机制评估并保留潜力特征。该框架在迭代反馈循环中运行直至性能收敛。在关系数据库基准测试上的实验表明,ReFuGe能显著提升各类关系数据库预测任务的性能。我们的代码与数据集已发布于 https://github.com/K-Kyungho/REFUGE。