Tabular data analysis is crucial in various fields, and large language models show promise in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL and TableQA, neglecting advanced analysis like forecasting and chart generation. To address this gap, we developed the Text2Analysis benchmark, incorporating advanced analysis tasks that go beyond the SQL-compatible operations and require more in-depth analysis. We also develop five innovative and effective annotation methods, harnessing the capabilities of large language models to enhance data quality and quantity. Additionally, we include unclear queries that resemble real-world user questions to test how well models can understand and tackle such challenges. Finally, we collect 2249 query-result pairs with 347 tables. We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.
翻译:表格数据分析在多个领域至关重要,大语言模型在此领域展现出巨大潜力。然而,当前研究主要聚焦于Text2SQL和TableQA等基础任务,忽视了预测分析与图表生成等高级分析任务。为填补这一空白,我们构建了Text2Analysis基准,其中包含超越SQL兼容操作、需要更深入分析能力的复杂分析任务。我们创新性地开发了五种高效的数据标注方法,利用大语言模型能力提升数据质量与数量。此外,我们引入类似真实用户提问的模糊查询,以评估模型理解与应对此类挑战的能力。最终,我们收集了涵盖347张表格的2249个查询-结果对。通过三种不同指标对五种先进模型进行评估,结果表明我们的基准为表格数据分析领域引入了显著挑战,为更高级的研究方向奠定了坚实基础。