Extensive work has shown that the performance and interpretability of commonsense reasoning can be improved via knowledge-augmented reasoning methods, where the knowledge that underpins the reasoning process is explicitly verbalized and utilized. However, existing implementations, including "chain-of-thought" and its variants, fall short in capturing the introspective nature of knowledge required in commonsense reasoning, and in accounting for the mutual adaptation between the generation and utilization of knowledge. We propose a novel method to develop an introspective commonsense reasoner, Crystal. To tackle commonsense problems, it first introspects for knowledge statements related to the given question, and subsequently makes an informed prediction that is grounded in the previously introspected knowledge. The knowledge introspection and knowledge-grounded reasoning modes of the model are tuned via reinforcement learning to mutually adapt, where the reward derives from the feedback given by the model itself. Experiments show that Crystal significantly outperforms both the standard supervised finetuning and chain-of-thought distilled methods, and enhances the transparency of the commonsense reasoning process. Our work ultimately validates the feasibility and potential of reinforcing a neural model with self-feedback.
翻译:论文摘要:大量研究表明,常识推理的性能与可解释性可通过知识增强推理方法得到提升,这类方法显式地表述并利用支撑推理过程的知识。然而,现有实现(包括“思维链”及其变体)未能捕捉常识推理所需知识的内省特性,也未考虑知识生成与利用之间的相互适应机制。我们提出了一种开发内省常识推理器Crystal的新方法。为解决常识问题,该模型首先针对给定问题内省生成相关知识陈述,随后基于此前内省获得的知识做出有根据的预测。通过强化学习对模型的知识内省与知识驱动推理两种模式进行协同调优,其奖励信号源自模型自身给出的反馈。实验表明,Crystal显著优于标准监督微调与思维链蒸馏方法,并提升了常识推理过程的透明度。本工作最终验证了利用自我反馈强化神经模型的可行性与潜力。