Large Language Models (LLMs) have demonstrated impressive capabilities across various tasks. However, LLMs often struggle with spatial reasoning which is one essential part of reasoning and inference and requires understanding complex relationships between objects in space. This paper proposes a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities. We evaluate our approach on two benchmark datasets: StepGame and SparQA, implementing three distinct strategies: (1) ASP (Answer Set Programming)-based symbolic reasoning, (2) LLM + ASP pipeline using DSPy, and (3) Fact + Logical rules. Our experiments demonstrate significant improvements over the baseline prompting methods, with accuracy increases of 40-50% on StepGame} dataset and 3-13% on the more complex SparQA dataset. The "LLM + ASP" pipeline achieves particularly strong results on the tasks of Finding Relations (FR) and Finding Block (FB) questions, though performance varies across different question types. The impressive results suggest that while neural-symbolic approaches offer promising directions for enhancing spatial reasoning in LLMs, their effectiveness depends heavily on the specific task characteristics and implementation strategies. We propose an integrated, simple yet effective set of strategies using a neural-symbolic pipeline to boost spatial reasoning abilities in LLMs. This pipeline and its strategies demonstrate strong and broader applicability to other reasoning domains in LLMs, such as temporal reasoning, deductive inference etc.
翻译:大语言模型(LLMs)已在多种任务中展现出卓越能力。然而,LLMs在空间推理方面仍存在明显不足,而空间推理作为推理与推断的重要组成部分,需要理解物体在空间中的复杂关系。本文提出一种新颖的神经符号框架,旨在增强LLMs的空间推理能力。我们在两个基准数据集StepGame和SparQA上评估了所提方法,并实现了三种不同策略:(1)基于ASP(答案集编程)的符号推理,(2)使用DSPy的LLM + ASP流程,(3)事实+逻辑规则。实验结果表明,相较于基线提示方法,我们的方法取得了显著改进:在StepGame数据集上准确率提升40-50%,在更复杂的SparQA数据集上提升3-13%。其中“LLM + ASP”流程在寻找关系(FR)和寻找模块(FB)问题上表现尤为突出,尽管在不同问题类型上性能存在差异。这些显著成果表明,虽然神经符号方法为增强LLMs的空间推理能力提供了有前景的方向,但其有效性在很大程度上取决于具体任务特征与实施策略。我们提出了一套集成化、简洁而有效的策略体系,通过神经符号流程来提升LLMs的空间推理能力。该流程及其策略展现出对LLMs其他推理领域(如时序推理、演绎推断等)的强大且广泛的适用性。