Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future research on this crucial yet challenging and under-explored task. However, collecting data analysis annotations curated by experts can be prohibitively expensive. We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs with a multi-turn prompting technique. We construct the DACO dataset, containing (1) 440 databases (of tabular data) collected from real-world scenarios, (2) ~2k query-answer pairs that can serve as weak supervision for model training, and (3) a concentrated but high-quality test set with human refined annotations that serves as our main evaluation benchmark. We train a 6B supervised fine-tuning (SFT) model on DACO dataset, and find that the SFT model learns reasonable data analysis capabilities. To further align the models with human preference, we use reinforcement learning to encourage generating analysis perceived by human as helpful, and design a set of dense rewards to propagate the sparse human preference reward to intermediate code generation steps. Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases, validating the effectiveness of our proposed algorithm. Data and code are released at https://github.com/shirley-wu/daco
翻译:数据分析是一项关键的分析过程,旨在生成深入研究与结论性见解,以全面回答用户针对表格数据提出的查询。本研究提出新资源与基准,以推动这一重要但尚未充分探索的挑战性任务。然而,收集专家标注的数据分析注解成本极高。我们提出利用大语言模型的代码生成能力,结合多轮提示技术自动生成高质量答案注解。我们构建了DACO数据集,包含:(1) 从真实场景收集的440个数据库(表格数据),(2) 约2000条查询-答案对,可作为模型训练的弱监督信号,(3) 经人工精炼的高质量测试集,作为主要评估基准。我们在DACO数据集上训练了6B参数的有监督微调(SFT)模型,发现该模型习得了合理的数据分析能力。为进一步使模型对齐人类偏好,我们采用强化学习鼓励生成人类认为有帮助的分析结果,并设计一组密集奖励将稀疏的人类偏好奖励传播至中间代码生成步骤。人工评估表明,我们的DACO-RL算法在57.72%的案例中生成比SFT模型更有帮助的答案,验证了所提算法的有效性。数据和代码已发布于https://github.com/shirley-wu/daco。