Language models (LMs) are capable of conducting in-context learning for multiple choice reasoning tasks, but the options in these tasks are treated equally. As humans often first eliminate wrong options before picking the final correct answer, we argue a similar two-step strategy can make LMs better at these tasks. To this end, we present the Process of Elimination (POE), a two-step scoring method. In the first step, POE scores each option, and eliminates seemingly wrong options. In the second step, POE masks these wrong options, and makes the final prediction from the remaining options. Zero-shot experiments on 8 reasoning tasks illustrate the effectiveness of POE, and a following analysis finds our method to be especially performant on logical reasoning tasks. We further analyze the effect of masks, and show that POE applies to few-shot settings and large language models (LLMs) like ChatGPT.
翻译:语言模型能够通过上下文学习完成多选推理任务,但这类任务中的各选项被同等对待。鉴于人类通常先排除错误选项再从中选择正确答案,我们认为类似的二阶段策略可提升语言模型在多选推理任务中的表现。为此,我们提出排除过程法——一种二阶段评分方法:第一阶段对每个选项进行评分并剔除看似错误的选项;第二阶段屏蔽这些错误选项后,从剩余选项中做出最终预测。在8项推理任务上的零样本实验验证了POE的有效性,后续分析发现该方法在逻辑推理任务中表现尤为突出。我们进一步分析了屏蔽机制的效果,并证明POE可适用于小样本设置及ChatGPT等大型语言模型。