Large Language Models (LLMs) have achieved remarkable performance in objective tasks such as open-domain question answering and mathematical reasoning, which can often be solved through recalling learned factual knowledge or chain-of-thought style reasoning. However, we find that the performance of LLMs in subjective tasks is still unsatisfactory, such as metaphor recognition, dark humor detection, etc. Compared to objective tasks, subjective tasks focus more on interpretation or emotional response rather than a universally accepted reasoning pathway. Based on the characteristics of the tasks and the strong dialogue-generation capabilities of LLMs, we propose RiC (Reasoning in Conversation), a method that focuses on solving subjective tasks through dialogue simulation. The motivation of RiC is to mine useful contextual information by simulating dialogues instead of supplying chain-of-thought style rationales, thereby offering potential useful knowledge behind dialogues for giving the final answers. We evaluate both API-based and open-source LLMs including GPT-4, ChatGPT, and OpenChat across twelve tasks. Experimental results show that RiC can yield significant improvement compared with various baselines.
翻译:大语言模型在开放域问答、数学推理等客观任务中已取得显著性能,这类任务通常可通过回忆习得的事实知识或链式推理来解决。然而我们发现,大语言模型在隐喻识别、黑色幽默检测等主观任务中的表现仍不理想。与客观任务不同,主观任务更侧重解读或情感反应,而非普遍接受的推理路径。基于任务特性及大语言模型强大的对话生成能力,我们提出RiC(对话推理)方法,通过对话模拟聚焦解决主观任务。RiC的动机是通过模拟对话而非提供链式推理依据来挖掘有用上下文信息,从而为最终答案提供对话背后的潜在知识。我们在包括GPT-4、ChatGPT和OpenChat在内的基于API及开源的大语言模型上进行了十二项任务评估。实验结果表明,相较于多种基线方法,RiC能带来显著性能提升。