Large Language Models (LLMs) have shown remarkable capabilities in natural language processing, mathematical problem solving, and tasks related to program synthesis. However, their effectiveness in long-term planning and higher-order reasoning has been noted to be limited and fragile. This paper explores an approach for enhancing LLM performance in solving a classical robotic planning task by integrating solver-generated feedback. We explore four different strategies for providing feedback, including visual feedback, we utilize fine-tuning, and we evaluate the performance of three different LLMs across a 10 standard and 100 more randomly generated planning problems. Our results suggest that the solver-generated feedback improves the LLM's ability to solve the moderately difficult problems, but the harder problems still remain out of reach. The study provides detailed analysis of the effects of the different hinting strategies and the different planning tendencies of the evaluated LLMs.
翻译:大型语言模型(LLMs)在自然语言处理、数学问题求解以及与程序合成相关的任务中展现出了卓越的能力。然而,其在长期规划和高阶推理方面的有效性被认为有限且脆弱。本文探讨了一种通过整合求解器生成的反馈来提升LLM在解决经典机器人规划任务中性能的方法。我们探索了四种不同的反馈提供策略,包括视觉反馈,并利用微调技术,在10个标准问题和100个随机生成的规划问题上评估了三种不同LLM的性能。我们的结果表明,求解器生成的反馈提升了LLM解决中等难度问题的能力,但更困难的问题仍然难以解决。本研究对不同提示策略的效果以及所评估LLM的不同规划倾向进行了详细分析。