Large Language Models (LLMs) represent formidable tools for sequence modeling, boasting an innate capacity for general pattern recognition. Nevertheless, their broader spatial reasoning capabilities, especially applied to numerical trajectory data, remain insufficiently explored. In this paper, we investigate the out-of-the-box performance of ChatGPT-3.5, ChatGPT-4 and Llama 2 7B models when confronted with 3D robotic trajectory data from the CALVIN baseline and associated tasks, including 2D directional and shape labeling. Additionally, we introduce a novel prefix-based prompting mechanism, which yields a 33% improvement on the 3D trajectory data and an increase of up to 10% on SpartQA tasks over zero-shot prompting (with gains for other prompting types as well). The experimentation with 3D trajectory data offers an intriguing glimpse into the manner in which LLMs engage with numerical and spatial information, thus laying a solid foundation for the identification of target areas for future enhancements.
翻译:大语言模型(LLMs)作为序列建模的强大工具,具备通用的模式识别能力。然而,其更广泛的空间推理能力,尤其是应用于数值轨迹数据时,仍未得到充分探索。本文研究了ChatGPT-3.5、ChatGPT-4及Llama 2 7B模型在处理CALVIN基准中的3D机器人轨迹数据及相关任务(包括2D方向标注与形状标注)时的开箱性能。此外,我们提出了一种新颖的基于前缀的提示机制,该机制在3D轨迹数据上实现了33%的性能提升,在SpartQA任务上相较零样本提示提升了高达10%(同时对其他提示类型也有增益)。对3D轨迹数据的实验为观察大语言模型如何与数值及空间信息交互提供了有趣的视角,从而为确定未来改进的靶向领域奠定了坚实基础。