Learning abstractions directly from data is a core challenge in robotics. Humans naturally operate at an abstract level, reasoning over high-level subgoals while delegating execution to low-level motor skills -- an ability that enables efficient problem solving in complex environments. In robotics, abstractions and hierarchical reasoning have long been central to planning, yet they are typically hand-engineered, demanding significant human effort and limiting scalability. Automating the discovery of useful abstractions directly from visual data would make planning frameworks more scalable and more applicable to real-world robotic domains. In this work, we focus on rearrangement tasks where the state is represented with raw images, and propose a method to induce discrete, graph-structured abstractions by combining structural constraints with an attention-guided visual distance. Our approach leverages the inherent bipartite structure of rearrangement problems, integrating structural constraints and visual embeddings into a unified framework. This enables the autonomous discovery of abstractions from vision alone, which can subsequently support high-level planning. We evaluate our method on two rearrangement tasks in simulation and show that it consistently identifies meaningful abstractions that facilitate effective planning and outperform existing approaches.
翻译:从数据中直接学习抽象概念是机器人领域的核心挑战。人类自然能在抽象层面运作,通过高层子目标进行推理,同时将具体执行委托给低层级运动技能——这种能力使人类能够在复杂环境中高效解决问题。在机器人学中,抽象化与分层推理长期以来都是规划的核心要素,但传统方法通常依赖人工设计,不仅耗费大量人力,还限制了可扩展性。若能通过视觉数据自动发现有效的抽象概念,将使规划框架更具可扩展性,并更适用于真实机器人场景。本文聚焦于以原始图像表征状态的重排任务,提出一种通过结合结构约束与注意力引导的视觉距离来诱导离散图结构抽象的方法。该方法充分利用重排问题固有的二分图结构,将结构约束与视觉嵌入整合至统一框架中,从而实现仅凭视觉信息自主发现抽象概念,进而支持高层规划。我们在仿真环境中的两个重排任务上验证了该方法,结果表明其能持续识别有意义的抽象概念,显著提升规划效率,且性能优于现有方法。