Elucidating the reasoning process with structured explanations from question to answer is crucial, as it significantly enhances the interpretability, traceability, and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricately structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning through supervised learning, ignoring logical dependencies between steps. Moreover, existing reinforcement learning (RL) based methods overlook the structured relationships, underutilizing the potential of RL in structured reasoning. In this paper, we propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning, effectively capturing the intricate relationships between different reasoning steps. In addition, we introduce a fine-grained reward function to meticulously delineate diverse reasoning steps. Extensive experiments show that SEER significantly outperforms state-of-the-art methods, achieving an absolute improvement of 6.9% over RL-based methods on EntailmentBank, a 4.4% average improvement on STREET benchmark, and exhibiting outstanding efficiency and cross-dataset generalization performance.
翻译:从问题到答案的推理过程伴随结构化解释进行阐明至关重要,因为这能显著增强问答系统的可解释性、可追溯性和可信度。然而,结构化解释要求模型执行高度结构化的推理,这带来了巨大挑战。现有方法大多通过监督学习聚焦于单步推理,忽视了步骤间的逻辑依赖关系。同时,基于强化学习(RL)的方法忽视了结构化关系,未能充分利用RL在结构化推理中的潜力。本文提出SEER这一新颖方法,通过最大化基于结构的回报来促进结构化推理与解释。我们提出的基于结构的回报能够精确描述结构化推理中固有的层次与分支结构,有效捕捉不同推理步骤之间的复杂关系。此外,我们引入细粒度奖励函数以细致区分不同的推理步骤。大量实验表明,SEER显著优于现有最优方法:在EntailmentBank数据集上相比基于RL的方法获得6.9%的绝对提升,在STREET基准测试上平均提升4.4%,并展现出卓越的效率与跨数据集泛化性能。