Elucidating the reasoning process with structured explanations from question to answer is fundamentally crucial, as it significantly enhances the interpretability and trustworthiness of question-answering (QA) systems. However, structured explanations demand models to perform intricate structured reasoning, which poses great challenges. Most existing methods focus on single-step reasoning through supervised learning, ignoring logical dependencies between steps. Meanwhile, existing reinforcement learning (RL)-based methods overlook the structured relationships, impeding RL's potential in structured reasoning. In this paper, we propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation. Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning, effectively capturing the intricate relationships between states. We also introduce a fine-grained reward function to meticulously delineate diverse reasoning steps. Extensive experiments show that SEER significantly outperforms state-of-the-art methods, achieving an absolute improvement of 6.9% over RL-based methods on EntailmentBank, a 4.4% average improvement on STREET benchmark, and exhibiting outstanding efficiency and cross-dataset generalization performance.
翻译:从问题到答案的过程中,用结构化解释阐明推理过程具有根本重要性,因为它能显著提升问答系统的可解释性与可信度。然而,结构化解释要求模型执行复杂的结构化推理,这带来了巨大挑战。现有方法大多通过监督学习关注单步推理,忽略了步骤间的逻辑依赖关系。同时,现有基于强化学习的方法忽视了结构化关系,阻碍了强化学习在结构化推理中的潜力。本文提出SEER这一新方法,通过最大化基于结构的回报来促进结构化推理与解释。我们提出的基于结构的回报精确描述了结构化推理中固有的层次结构与分支结构,有效捕捉了状态间的复杂关系。我们还引入细粒度的奖励函数,以细致区分不同的推理步骤。大量实验表明,SEER显著优于现有最优方法:在EntailmentBank上相比基于强化学习的方法实现6.9%的绝对提升,在STREET基准上平均提升4.4%,并展现出卓越的效率与跨数据集泛化能力。