Elucidating the reasoning process with structured explanations from question to answer is crucial, as it significantly enhances the interpretability, traceability, and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricately structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning through supervised learning, ignoring logical dependencies between steps. Moreover, existing reinforcement learning (RL) based methods overlook the structured relationships, underutilizing the potential of RL in structured reasoning. In this paper, we propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning, effectively capturing the intricate relationships between different reasoning steps. In addition, we introduce a fine-grained reward function to meticulously delineate diverse reasoning steps. Extensive experiments show that SEER significantly outperforms state-of-the-art methods, achieving an absolute improvement of 6.9% over RL-based methods on EntailmentBank, a 4.4% average improvement on STREET benchmark, and exhibiting outstanding efficiency and cross-dataset generalization performance. Our code is available at https://github.com/Chen-GX/SEER.
翻译:阐明从问题到答案的推理过程并提供结构化解释至关重要,因为这能显著提升问答系统的可解释性、可追溯性与可信度。然而,结构化解释要求模型执行复杂的结构化推理,这带来了巨大挑战。现有方法大多通过监督学习关注单步推理,忽略了步骤间的逻辑依赖性。此外,基于强化学习的方法往往忽视结构化关系,未能充分利用强化学习在结构化推理中的潜力。本文提出SEER这一新方法,通过最大化基于结构的回报来促进结构化推理与解释。我们提出的基于结构的回报精确描述了结构化推理固有的层次与分支结构,有效捕捉了不同推理步骤间的复杂关系。此外,我们引入了细粒度奖励函数来细致刻画多样化的推理步骤。大量实验表明,SEER显著优于现有最优方法,在EntailmentBank上相比基于强化学习的方法取得6.9%的绝对提升,在STREET基准上平均提升4.4%,并展现出卓越的效率和跨数据集泛化性能。代码已开源:https://github.com/Chen-GX/SEER。