The rapid development of LLMs has significantly advanced tabular question answering, but most systems cannot perform future-oriented numerical prediction. To address this gap, we introduce a novel task, Open-Domain Tabular Question Answering for Future Data Forecasting and Reasoning, and propose the first dataset to cover time-series forecasting and forecast-based reasoning scenarios using real estate data. This task poses challenges in retrieving precise historical data, overcoming the forecasting limitations of LLMs, and standardizing responses for diverse queries. To solve the above challenges, we propose TimeFore, an LLM agent-based framework that decomposes the problem into three collaborative roles: a Retriever autonomously generates SQL to fetch data, a Forecaster invokes external time-series models for higher accuracy, and an Analyzer synthesizes the results to construct a precise and consistent final answer. Extensive experiments demonstrate the effectiveness of our TimeFore.
翻译:大语言模型的快速发展显著推进了表格问答领域,但大多数系统无法实现面向未来的数值预测。为填补这一空白,我们提出了一项新任务——面向未来数据预测与推理的开放域表格问答,并构建了首个涵盖时间序列预测及基于预测推理场景的数据集(采用房地产数据)。该任务面临三大挑战:检索精确历史数据、克服大语言模型在预测能力上的局限性,以及针对多样化查询标准化响应格式。为解决上述挑战,我们提出了TimeFore框架——一个基于大语言模型智能体的系统,将问题分解为三个协作模块:检索器通过自动生成SQL语句获取数据,预测器调用外部时间序列模型提升预测精度,分析器综合结果构建精确一致的最终答案。大量实验验证了TimeFore框架的有效性。