Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context can help large language models (LLMs) better solve a given problem than hand-crafted prompts. However, it is yet not clear whether relevance is the key factor eliciting such capability, i.e., can LLMs benefit more from self-generated relevant examples than irrelevant ones? In this work, we systematically explore whether LLMs can truly perform analogical reasoning on a diverse set of reasoning tasks. With extensive experiments and analysis, we show that self-generated random examples can surprisingly achieve comparable or even better performance, e.g., 4% performance boost on GSM8K with random biological examples. We find that the accuracy of self-generated examples is the key factor and subsequently design two improved methods with significantly reduced inference costs. Overall, we aim to advance a deeper understanding of LLM analogical reasoning and hope this work stimulates further research in the design of self-generated contexts.
翻译:类比推理是人类特有的一种能力,即通过从相关过往经验中迁移策略来应对不熟悉的挑战。心理学的一项关键发现是,与不相关的过往经验相比,回忆相关经验能帮助人类更好地处理新任务。巧合的是,自然语言处理领域近期也发现,在上下文中自生成相关示例比手工设计的提示更能帮助大型语言模型解决问题。然而,目前尚不清楚相关性是否是激发这一能力的关键因素,即自生成的相关示例是否比不相关示例更能使大型语言模型受益?本文系统探讨了大型语言模型是否能在多种推理任务上真正进行类比推理。通过大量实验与分析,我们表明自生成的随机示例竟能达到相当甚至更好的性能,例如在GSM8K任务中使用随机生物学示例实现了4%的性能提升。我们发现自生成示例的准确性才是关键因素,并据此设计了两种降低推理成本的改进方法。总体而言,我们旨在推动对大型语言模型类比推理的更深入理解,并希望本研究能激励未来对自生成上下文设计的相关探索。