Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In this work, we enable small-scale LMs (approx. 200x smaller than GPT-3) to generate rationales that not only improve downstream task performance, but are also more plausible, consistent, and diverse, assessed both by automatic and human evaluation. Our method, MaRio (Multi-rewArd RatIOnalization), is a multi-reward conditioned self-rationalization algorithm that optimizes multiple distinct properties like plausibility, diversity and consistency. Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline. Extensive human evaluations confirm that MaRio rationales are preferred vs. SFT rationales, as well as qualitative improvements in plausibility and consistency.
翻译:大型语言模型能够生成自由文本解释来辅助问答任务。然而,既有研究1)表明有用的自我解释能力仅在显著规模(如175B参数的GPT-3)下显现;2)主要关注下游性能,忽视了解释本身的语义特性,例如:这些解释是否忠实、真实且对人类有用?在本研究中,我们使小规模语言模型(约为GPT-3的1/200参数规模)不仅能生成提升下游任务性能的解释,还能通过自动评估和人工评估证明其更合理、一致且多样。我们提出的方法MaRio(多奖励合理化算法)是一种基于多奖励条件的自我解释算法,可同时优化合理性、多样性和一致性等多个不同属性。在StrategyQA、QuaRel、OpenBookQA、NumerSense和QASC五个困难问答数据集上的实验表明:MaRio不仅提升了任务准确率,还在上述维度上比监督微调基线更有效地改善了小型语言模型的自我解释质量。大量人工评估证实,相较于监督微调生成的解释,MaRio生成的解释在合理性和一致性方面具有质性提升,且更受评估者青睐。