Despite the success of vision-based generalist robotic policies, existing tactile-based policies remain tied to fixed embodiments and sensor setups. This is because tactile signals are highly heterogeneous across hardware, making cross-sensor generalization difficult. We present FTP-1,the first generalist foundation tactile policy pretrained to acquire transferable tactile manipulation abilities across diverse sensors and embodiments. FTP-1 supports varied tactile inputs, including image-, array-, and state-based signals, by using heterogeneous encoders to project them into unified morphology-aware latent tokens that are jointly modeled by a shared tactile Transformer expert. Pretrained on around 3,000 hours of tactile manipulation data aggregated from 26 data sources, spanning human and robot demonstrations across 21 sensors, FTP-1 learns tactile skills that transfer beyond the sensors seen during pretraining. Across downstream finetuning experiments spanning 5 hardware configurations, FTP-1 improves contact-rich manipulation on seen sensor setups by +17.2% and, surprisingly, transfers to two previously unseen tactile-sensor setups, achieving a +31% gain in success rate. FTP-1 establishes the first unified foundation baseline for tactile manipulation, providing future tactile policies with a shared model-level starting point. Pretrained models, datasets, training code and more visualization at https://ftp1-policy.github.io.
翻译:尽管基于视觉的通用机器人策略已取得显著成功,现有基于触觉的策略仍受限于固定硬件架构与传感器配置。其根源在于触觉信号在不同硬件间存在高度异质性,导致跨传感器泛化困难。本文提出FTP-1——首个预训练获得跨不同传感器与硬件架构可迁移触觉操作能力的通用基础策略。FTP-1通过异构编码器将图像、阵列及状态三类触觉输入信号投影为统一的形态感知潜在表征,并由共享的触觉Transformer专家模块联合建模。基于从26个数据源聚合的约3000小时触觉操作数据(涵盖21种传感器上的人类演示与机器人示教),FTP-1习得的触觉技能可迁移至预训练阶段未见的传感器。在覆盖5种硬件配置的下游微调实验中,FTP-1在已知传感器配置上的接触式操作成功率提升17.2%,且令人惊讶地可迁移至两种未见过的触觉传感器配置,实现31%的成功率增幅。FTP-1为触觉操作建立了首个统一基础基准,为未来触觉策略提供了共享模型级起点。预训练模型、数据集、训练代码及更多可视化结果详见https://ftp1-policy.github.io。