Language models still struggle on moral reasoning, despite their impressive performance in many other tasks. In particular, the Moral Scenarios task in MMLU (Multi-task Language Understanding) is among the worst performing tasks for many language models, including GPT-3. In this work, we propose a new prompting framework, Thought Experiments, to teach language models to do better moral reasoning using counterfactuals. Experiment results show that our framework elicits counterfactual questions and answers from the model, which in turn helps improve the accuracy on Moral Scenarios task by 9-16% compared to other zero-shot baselines. Interestingly, unlike math reasoning tasks, zero-shot Chain-of-Thought (CoT) reasoning doesn't work out of the box, and even reduces accuracy by around 4% compared to direct zero-shot. We further observed that with minimal human supervision in the form of 5 few-shot examples, the accuracy of the task can be improved to as much as 80%.
翻译:语言模型在许多任务中表现出色,但在道德推理方面仍存在困难。特别是,MMLU(多任务语言理解)中的道德场景任务对于包括GPT-3在内的许多语言模型而言,是表现最差的任务之一。在本工作中,我们提出了一种新的提示框架——“思想实验”,通过使用反事实推理来教导语言模型进行更好的道德推理。实验结果表明,我们的框架能够引导模型生成反事实问题及答案,相较于其他零样本基线方法,在道德场景任务上的准确率提升了9%至16%。有趣的是,与数学推理任务不同,零样本思维链推理并不直接奏效,甚至相比直接零样本方法准确率下降了约4%。我们进一步观察到,在仅使用5个少量示例的人类监督下,该任务的准确率可提升至高达80%。