Overview of the Proposed DECO Framework.} DECO is a DiT-based policy that decouples multimodal conditioning. Image and action tokens interact via joint self attention, while proprioceptive states and optional conditions are injected through adaptive layer normalization. Tactile signals are injected via cross attention, while a lightweight LoRA-based adapter is used to efficiently fine-tune the pretrained policy. DECO is also accompanied by DECO-50, a bimanual dexterous manipulation dataset with tactile sensing, consisting of 4 scenarios and 28 sub-tasks, covering more than 50 hours of data, approximately 5 million frames, and 8,000 successful trajectories.
翻译:所提出的DECO框架概述。DECO是一种基于DiT的策略,其解耦了多模态条件。图像与动作令牌通过联合自注意力进行交互,而本体感知状态与可选条件则通过自适应层归一化注入。触觉信号通过交叉注意力注入,同时采用基于LoRA的轻量级适配器对预训练策略进行高效微调。DECO还配套发布了DECO-50——一个带触觉感知的双手灵巧操作数据集,包含4种场景和28项子任务,涵盖超过50小时的数据、约500万帧画面及8000条成功轨迹。