Neural language models (LMs) have achieved impressive results on various language-based reasoning tasks by utilizing latent knowledge encoded in their own pretrained parameters. To make this reasoning process more explicit, recent works retrieve a rationalizing LM's internal knowledge by training or prompting it to generate free-text rationales, which can be used to guide task predictions made by either the same LM or a separate reasoning LM. However, rationalizing LMs require expensive rationale annotation and/or computation, without any assurance that their generated rationales improve LM task performance or faithfully reflect LM decision-making. In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. First, PINTO maps out a suitable reasoning process for the task input by prompting a frozen rationalizing LM to generate a free-text rationale. Second, PINTO's reasoning LM is fine-tuned to solve the task using the generated rationale as context, while regularized to output less confident predictions when the rationale is perturbed. Across four datasets, we show that PINTO significantly improves the generalization ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets. Also, we find that PINTO's rationales are more faithful to its task predictions than those generated by competitive baselines.
翻译:神经语言模型(LMs)通过利用其预训练参数中编码的潜在知识,在各种基于语言的推理任务上取得了显著成果。为使这一推理过程更加明确,近期研究通过训练或提示(prompt)一个可解释的LM生成自由文本理由(free-text rationales),从而提取其内部知识,以指导同一LM或另一个独立推理LM的任务预测。然而,这类可解释的LM需要昂贵的理由标注和/或计算资源,且无法保证其生成的理由能提升LM的任务表现或忠实反映LM的决策过程。本文提出PINTO,一种基于提示学习的LM流水线,通过反事实正则化实现基于理由的可靠推理。首先,PINTO通过提示一个冻结的可解释LM生成自由文本理由,为任务输入规划出合理的推理过程;其次,PINTO的推理LM以生成的理由作为上下文进行微调以解决任务,同时通过正则化在理由被扰动时输出置信度更低的预测。在四个数据集上,我们证明PINTO显著提升了推理LM的泛化能力,在分布内和分布外测试集上均取得了更高性能。此外,我们发现PINTO的理由相比竞争基线生成的方案更忠实于其任务预测。