There emerges a promising trend of using large language models (LLMs) to generate code-like plans for complex inference tasks such as visual reasoning. This paradigm, known as LLM-based planning, provides flexibility in problem solving and endows better interpretability. However, current research is mostly limited to basic scenarios of simple questions that can be straightforward answered in a few inference steps. Planning for the more challenging multi-hop visual reasoning tasks remains under-explored. Specifically, under multi-hop reasoning situations, the trade-off between accuracy and the complexity of plan-searching becomes prominent. The prevailing algorithms either address the efficiency issue by employing the fast one-stop generation or adopt a complex iterative generation method to improve accuracy. Both fail to balance the need for efficiency and performance. Drawing inspiration from the dual system of cognition in the human brain, the fast and the slow think processes, we propose a hierarchical plan-searching algorithm that integrates the one-stop reasoning (fast) and the Tree-of-thought (slow). Our approach succeeds in performance while significantly saving inference steps. Moreover, we repurpose the PTR and the CLEVER datasets, developing a systematic framework for evaluating the performance and efficiency of LLMs-based plan-search algorithms under reasoning tasks at different levels of difficulty. Extensive experiments demonstrate the superiority of our proposed algorithm in terms of performance and efficiency. The dataset and code will be release soon.
翻译:大型语言模型(LLMs)在生成类代码计划以完成复杂推理任务(如视觉推理)方面展现出有前景的趋势,这种基于LLM的规划范式提供了问题求解的灵活性并增强了可解释性。然而,当前研究大多局限于可通过简单推理步骤直接回答的基础问题场景。针对更具挑战性的多跳视觉推理任务的规划仍处于探索不足阶段。具体而言,在多跳推理情境下,准确性与规划搜索复杂度之间的权衡变得突出。现有算法要么采用快速一步生成策略以提高效率,要么采用复杂迭代生成方法以提升准确性,均未能平衡效率与性能需求。受人类大脑双系统认知理论(快速思维与慢速思维过程)的启发,我们提出了一种融合一步推理(快速)与思维树(慢速)的分层规划搜索算法。该方法在显著节省推理步骤的同时成功提升了性能。此外,我们重新构建了PTR和CLEVER数据集,开发了一套系统化框架,用于评估基于LLM的规划搜索算法在不同难度推理任务中的性能与效率。大量实验证明了所提算法在性能与效率方面的优越性。数据集与代码即将发布。