Spatio-Temporal Logic (SpaTiaL) offers a principled formalism for expressing geometric spatial requirements-an essential component of robotic manipulation, where object locations, neighborhood relations, pose constraints, and interactions directly determine task success. Yet prior works have largely relied on standard temporal logic (TL), which models only robot trajectories and overlooks object-level interactions. Existing datasets built from randomly generated TL formulas paired with natural-language descriptions therefore cover temporal operators but fail to represent the layered spatial relations that manipulation tasks depend on. To address this gap, we introduce a dataset generation framework that synthesizes SpaTiaL specifications and converts them into natural-language descriptions through a deterministic, semantics-preserving back-translation procedure. This pipeline produces the NL2SpaTiaL dataset, aligning natural language with multi-level spatial relations and temporal objectives to reflect the compositional structure of manipulation tasks. Building on this foundation, we propose a translation-verification framework equipped with a language-based semantic checker that ensures the generated SpaTiaL formulas faithfully encode the semantics specified by the input description. Experiments across a suite of manipulation tasks show that SpaTiaL-based representations yield more interpretable, verifiable, and compositional grounding for instruction following. Project website: https://sites.google.com/view/nl2spatial
翻译:时空逻辑(SpaTiaL)提供了一种原则性的形式化方法,用于表达几何空间要求——这是机器人操作的关键组成部分,其中物体位置、邻近关系、姿态约束和交互直接决定任务成败。然而,先前的研究主要依赖于标准时序逻辑(TL),该逻辑仅对机器人轨迹建模,忽略了物体层面的交互。现有基于随机生成的TL公式与自然语言描述配对构建的数据集因此覆盖了时序算子,但未能体现操作任务所依赖的层次化空间关系。为填补这一空白,我们引入了一个数据集生成框架,该框架合成了SpaTiaL规范,并通过确定性的、保持语义的反向翻译过程将其转换为自然语言描述。此流程生成了NL2SpaTiaL数据集,将自然语言与多层次空间关系和时序目标对齐,以反映操作任务的组合结构。在此基础上,我们提出了一个翻译-验证框架,配备了一个基于语言的语义检查器,确保生成的SpaTiaL公式忠实地编码了输入描述指定的语义。在一系列操作任务上的实验表明,基于SpaTiaL的表示能为指令跟随提供更可解释、可验证和可组合的语义基础。项目网站:https://sites.google.com/view/nl2spatial