Spatial reasoning focuses on locating target objects based on spatial relations in 3D scenes, which plays a crucial role in developing intelligent embodied agents. Due to the limited availability of 3D scene-language paired data, it is challenging to train models with strong reasoning ability from scratch. Previous approaches have attempted to inject 3D scene representations into the input space of Large Language Models (LLMs) and leverage the pretrained comprehension and reasoning abilities for spatial reasoning. However, models encoding absolute positions struggle to extract spatial relations from prematurely fused features, while methods explicitly encoding all spatial relations (which is quadratic in the number of objects) as input tokens suffer from poor scalability. To address these limitations, we propose QuatRoPE, a novel positional embedding method with an input length that is linear to the number of objects, and explicitly calculates pairwise spatial relations through the dot product in attention layers. QuatRoPE's holistic vector encoding of 3D coordinates guarantees a high degree of spatial consistency, maintaining fidelity to the scene's geometric integrity. Additionally, we introduce the Isolated Gated RoPE Extension (IGRE), which effectively limits QuatRoPE's influence to object-related tokens, thereby minimizing interference with the LLM's existing positional embeddings and maintaining the LLM's original capabilities. Extensive experiments demonstrate the effectiveness of our approaches. The code and data are available at https://github.com/oceanflowlab/QuatRoPE.
翻译:空间推理聚焦于基于三维场景中空间关系定位目标对象,这对开发智能具身体代理至关重要。由于三维场景-语言配对数据有限,从头训练具有强推理能力的模型具有挑战性。以往方法尝试将三维场景表征注入大语言模型(LLMs)的输入空间,并利用其预训练理解与推理能力完成空间推理。然而,编码绝对位置的模型难以从过早融合的特征中提取空间关系,而显式编码所有空间关系(其复杂度与对象数量呈平方关系)作为输入令牌的方法存在可扩展性差的问题。为解决这些局限,我们提出QuatRoPE——一种新型位置嵌入方法,其输入长度与对象数量呈线性关系,并通过注意力层中的点积显式计算成对空间关系。QuatRoPE对三维坐标的全向量编码确保了高度空间一致性,保持了对场景几何完整性的忠实度。此外,我们引入隔离门控RoPE扩展(IGRE),有效将QuatRoPE的影响限制在对象相关令牌上,从而最小化对LLM现有位置嵌入的干扰,维持LLM原有能力。大量实验证明了我们方法的有效性。代码与数据已开源在https://github.com/oceanflowlab/QuatRoPE。