Query-focused summarization over multi-table data is a challenging yet critical task for extracting precise and relevant information from structured data. Existing methods often rely on complex preprocessing steps and struggle to generalize across domains or handle the logical reasoning required for multi-table queries. In this paper, we propose QueryTableSummarizer++, an end-to-end generative framework leveraging large language models (LLMs) enhanced with table-aware pre-training, query-aligned fine-tuning, and reinforcement learning with feedback. Our method eliminates the need for intermediate serialization steps and directly generates query-relevant summaries. Experiments on a benchmark dataset demonstrate that QueryTableSummarizer++ significantly outperforms state-of-the-art baselines in terms of BLEU, ROUGE, and F1-score. Additional analyses highlight its scalability, generalization across domains, and robust handling of complex queries. Human evaluation further validates the superior quality and practical applicability of the generated summaries, establishing QueryTableSummarizer++ as a highly effective solution for multi-table summarization tasks.
翻译:在多表数据上进行查询聚焦式摘要是一项具有挑战性但至关重要的任务,旨在从结构化数据中提取精确且相关的信息。现有方法通常依赖于复杂的预处理步骤,难以实现跨领域的泛化,也难以处理多表查询所需的逻辑推理。本文提出QueryTableSummarizer++,这是一个端到端的生成式框架,它利用经过增强的大型语言模型,增强措施包括:面向表格的预训练、查询对齐的微调以及基于反馈的强化学习。我们的方法消除了对中间序列化步骤的需求,能够直接生成与查询相关的摘要。在基准数据集上的实验表明,QueryTableSummarizer++在BLEU、ROUGE和F1分数方面显著优于最先进的基线方法。额外的分析突显了其可扩展性、跨领域泛化能力以及对复杂查询的稳健处理能力。人工评估进一步验证了所生成摘要的优越质量和实际适用性,确立了QueryTableSummarizer++作为多表摘要任务的高效解决方案。