There emerges a promising trend of using large language models (LLMs) to generate code-like plans for complex inference tasks such as visual reasoning. This paradigm, known as LLM-based planning, provides flexibility in problem solving and endows better interpretability. However, current research is mostly limited to basic scenarios of simple questions that can be straightforward answered in a few inference steps. Planning for the more challenging multi-hop visual reasoning tasks remains under-explored. Specifically, under multi-hop reasoning situations, the trade-off between accuracy and the complexity of plan-searching becomes prominent. The prevailing algorithms either address the efficiency issue by employing the fast one-stop generation or adopt a complex iterative generation method to improve accuracy. Both fail to balance the need for efficiency and performance. Drawing inspiration from the dual system of cognition in the human brain, the fast and the slow think processes, we propose a hierarchical plan-searching algorithm that integrates the one-stop reasoning (fast) and the Tree-of-thought (slow). Our approach succeeds in performance while significantly saving inference steps. Moreover, we repurpose the PTR and the CLEVER datasets, developing a systematic framework for evaluating the performance and efficiency of LLMs-based plan-search algorithms under reasoning tasks at different levels of difficulty. Extensive experiments demonstrate the superiority of our proposed algorithm in terms of performance and efficiency. The dataset and code will be release soon.
翻译:大型语言模型(LLM)在生成类代码计划以完成复杂推理任务(如视觉推理)方面展现出有前景的趋势。这种被称为"基于LLM的计划"范式,为问题求解提供了灵活性,并赋予更好的可解释性。然而,当前研究大多局限于可通过少数推理步骤直接回答的简单问题场景。针对更具挑战性的多跳视觉推理任务的规划仍缺乏充分探索。具体而言,在多跳推理情境下,精度与计划搜索复杂度之间的权衡变得尤为突出。现有算法要么采用快速一步生成来解决效率问题,要么采用复杂的迭代生成方法提升精度,两者均未能平衡效率与性能的需求。受人类大脑双系统认知——快思考与慢思考——的启发,我们提出了一种分层计划搜索算法,该算法整合了一步推理(快思考)与思维树(慢思考)。我们的方法在显著节省推理步骤的同时成功保持了性能。此外,我们重新利用PTR和CLEVER数据集,开发了一个系统化框架,用于在不同难度级别的推理任务下评估基于LLM的计划搜索算法的性能与效率。大量实验证明了我们提出算法在性能与效率方面的优越性。数据集与代码将很快公开。