The central challenge in robotic manipulation of deformable objects lies in aligning high-level semantic instructions with physical interaction points under complex appearance and texture variations. Due to near-infinite degrees of freedom, complex dynamics, and heterogeneous patterns, existing vision-based affordance prediction methods often suffer from boundary overflow and fragmented functional regions. To address these issues, we propose TRACER, a Texture-Robust Affordance Chain-of-thought with dEformable-object Refinement framework, which establishes a cross-hierarchical mapping from hierarchical semantic reasoning to appearance-robust and physically consistent functional region refinement. Specifically, a Tree-structured Affordance Chain-of-Thought (TA-CoT) is formulated to decompose high-level task intentions into hierarchical sub-task semantics, providing consistent guidance across various execution stages. To ensure spatial integrity, a Spatial-Constrained Boundary Refinement (SCBR) mechanism is introduced to suppress prediction spillover, guiding the perceptual response to converge toward authentic interaction manifolds. Furthermore, an Interactive Convergence Refinement Flow (ICRF) is developed to aggregate discrete pixels corrupted by appearance noise, significantly enhancing the spatial continuity and physical plausibility of the identified functional regions. Extensive experiments conducted on the Fine-AGDDO15 dataset and a real-world robotic platform demonstrate that TRACER significantly improves affordance grounding precision across diverse textures and patterns inherent to deformable objects. More importantly, it enhances the success rate of long-horizon tasks, effectively bridging the gap between high-level semantic reasoning and low-level physical execution. The source code and dataset will be made publicly available at https://github.com/Dikay1/TRACER.
翻译:机器人操作可变形物体的核心挑战在于,在复杂的外观与纹理变化下,将高层语义指令与物理交互点进行对齐。由于近乎无限的自由度、复杂的动力学特性以及异质性模式,现有的基于视觉的可供性预测方法常面临边界溢出与功能区域碎片化的问题。为解决这些问题,我们提出了TRACER,一种纹理鲁棒性可供性思维链与可变形物体细化框架,该框架建立了从层次化语义推理到外观鲁棒且物理一致的功能区域细化的跨层次映射。具体而言,我们构建了一种树状结构可供性思维链,将高层任务意图分解为层次化的子任务语义,为不同执行阶段提供一致的指导。为确保空间完整性,引入了空间约束边界细化机制以抑制预测溢出,引导感知响应收敛至真实的交互流形。此外,开发了交互式收敛细化流,以聚合受外观噪声干扰的离散像素,显著提升了所识别功能区域的空间连续性与物理合理性。在Fine-AGDDO15数据集及真实机器人平台上进行的大量实验表明,TRACER显著提升了针对可变形物体固有多样纹理与模式的可供性定位精度。更重要的是,它提高了长时域任务的成功率,有效弥合了高层语义推理与低层物理执行之间的鸿沟。源代码与数据集将在https://github.com/Dikay1/TRACER 公开提供。