AnyTouch 2：面向动态触觉感知的通用光学触觉表征学习 (AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception)

Real-world contact-rich manipulation demands robots to perceive temporal tactile feedback, capture subtle surface deformations, and reason about object properties as well as force dynamics. Although optical tactile sensors are uniquely capable of providing such rich information, existing tactile datasets and models remain limited. These resources primarily focus on object-level attributes (e.g., material) while largely overlooking fine-grained tactile temporal dynamics during physical interactions. We consider that advancing dynamic tactile perception requires a systematic hierarchy of dynamic perception capabilities to guide both data collection and model design. To address the lack of tactile data with rich dynamic information, we present ToucHD, a large-scale hierarchical tactile dataset spanning tactile atomic actions, real-world manipulations, and touch-force paired data. Beyond scale, ToucHD establishes a comprehensive tactile dynamic data ecosystem that explicitly supports hierarchical perception capabilities from the data perspective. Building on it, we propose AnyTouch 2, a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. The framework captures both pixel-level and action-specific deformations across frames, while explicitly modeling physical force dynamics, thereby learning multi-level dynamic perception capabilities from the model perspective. We evaluate our model on benchmarks that covers static object properties and dynamic physical attributes, as well as real-world manipulation tasks spanning multiple tiers of dynamic perception capabilities-from basic object-level understanding to force-aware dexterous manipulation. Experimental results demonstrate consistent and strong performance across sensors and tasks.

翻译：现实世界中富含接触的操作任务要求机器人能够感知时序触觉反馈、捕捉细微的表面形变，并推理物体属性及力动力学。尽管光学触觉传感器在提供此类丰富信息方面具有独特优势，但现有的触觉数据集与模型仍存在局限。这些资源主要关注物体级属性（如材质），而在很大程度上忽视了物理交互过程中细粒度的触觉时序动态。我们认为，推进动态触觉感知需要一个系统化的动态感知能力层级结构，以指导数据收集与模型设计。为解决缺乏富含动态信息的触觉数据的问题，我们提出了ToucHD——一个大规模分层的触觉数据集，涵盖触觉原子动作、真实世界操作以及触觉-力配对数据。除规模外，ToucHD从数据角度建立了一个全面的触觉动态数据生态系统，明确支持分层的感知能力。在此基础上，我们提出了AnyTouch 2，一个面向多种光学触觉传感器的通用触觉表征学习框架，它将物体级理解与细粒度、力感知的动态感知统一起来。该框架捕获跨帧的像素级和动作特定的形变，同时显式建模物理力动力学，从而从模型角度学习多层次的动态感知能力。我们在涵盖静态物体属性和动态物理属性的基准测试，以及跨越多个动态感知能力层级（从基本的物体级理解到力感知的灵巧操作）的真实世界操作任务上评估了我们的模型。实验结果表明，该模型在不同传感器和任务上均表现出稳定且优异的性能。