The zero-shot chain of thought (CoT) approach is often used in question answering (QA) by language models (LMs) for tasks that require multiple reasoning steps. However, some QA tasks hinge more on accessing relevant knowledge than on chaining reasoning steps. We introduce a simple prompting technique, called PREP, that involves using two instances of LMs: the first (LM1) generates relevant information, and the second (LM2) receives the information from the user and answers the question. This design is intended to make better use of the LM's instruction-following capability. PREP is applicable across various QA tasks without domain-specific prompt engineering. PREP is developed on a dataset of 100 QA questions, derived from an extensive schematic dataset specifying artifact parts and material composition. These questions ask which of two artifacts is less likely to share materials with another artifact. Such questions probe the LM's knowledge of shared materials in the part structure of different artifacts. We test our method on our parts-and-materials dataset and three published commonsense reasoning datasets. The average accuracy of our method is consistently higher than that of all the other tested methods across all the tested datasets.
翻译:零样本思维链(CoT)方法常被用于语言模型(LMs)在需要多步推理的问题回答(QA)任务中。然而,某些QA任务更依赖于获取相关知识,而非链接推理步骤。我们引入了一种简单的提示技术,称为PREP,它涉及使用两个语言模型实例:第一个模型(LM1)生成相关信息,第二个模型(LM2)接收来自用户的信息并回答问题。此设计旨在更好地利用语言模型的指令遵循能力。PREP适用于各种QA任务,无需领域特定的提示工程。PREP是在一个包含100个QA问题的数据集上开发的,该数据集源自一个详细规定工件部件和材料组成的广泛示意图数据集。这些问题询问两个工件中哪一个更不可能与另一个工件共享材料。此类问题探究了语言模型对于不同工件部件结构中共享材料的认知。我们在自建的部件与材料数据集以及三个已发布的常识推理数据集上测试了我们的方法。在所有测试数据集上,我们方法的平均准确率始终高于所有其他测试方法。