Scientific question answering (SQA) is an important task aimed at answering questions based on papers. However, current SQA datasets have limited reasoning types and neglect the relevance between tables and text, creating a significant gap with real scenarios. To address these challenges, we propose a QA benchmark for scientific tables and text with diverse reasoning types (SciTaT). To cover more reasoning types, we summarize various reasoning types from real-world questions. To involve both tables and text, we require the questions to incorporate tables and text as much as possible. Based on SciTaT, we propose a strong baseline (CaR), which combines various reasoning methods to address different reasoning types and process tables and text at the same time. CaR brings average improvements of 12.9% over other baselines on SciTaT, validating its effectiveness. Error analysis reveals the challenges of SciTaT, such as complex numerical calculations and domain knowledge.
翻译:科学问答(SQA)是一项旨在基于论文回答问题的重要任务。然而,当前的SQA数据集推理类型有限,且忽视了表格与文本之间的关联性,与真实场景存在显著差距。为应对这些挑战,我们提出了一个覆盖多样化推理类型的科学表格与文本问答基准(SciTaT)。为涵盖更多推理类型,我们从真实世界问题中总结了多种推理类型。为同时纳入表格与文本,我们要求问题尽可能结合表格和文本。基于SciTaT,我们提出了一个强基线模型(CaR),该模型结合多种推理方法以处理不同推理类型,并同时处理表格和文本。在SciTaT上,CaR相比其他基线平均提升了12.9%,验证了其有效性。错误分析揭示了SciTaT面临的挑战,例如复杂数值计算和领域知识需求。