Learning soft continuum robot (SCR) dynamics from video offers flexibility but existing methods lack interpretability or rely on prior assumptions. Model-based approaches require prior knowledge and manual design. We bridge this gap by introducing: (1) The Attention Broadcast Decoder (ABCD), a plug-and-play module for autoencoder-based latent dynamics learning that generates pixel-accurate attention maps localizing each latent dimension's contribution while filtering static backgrounds, enabling visual interpretability via spatially grounded latents and on-image overlays. (2) Visual Oscillator Networks (VONs), a 2D latent oscillator network coupled to ABCD attention maps for on-image visualization of learned masses, coupling stiffness, and forces, thereby enabling mechanical interpretability. We validate our approach on single- and double-segment SCRs, demonstrating that ABCD-based models significantly improve multi-step prediction accuracy with 5.8x error reduction for Koopman operators and 3.5x for oscillator networks on a two-segment robot. VONs autonomously discover a chain structure of oscillators. This fully data-driven approach yields compact, mechanically interpretable models with potential relevance for future control applications.
翻译:从视频中学习软体连续体机器人(SCR)动力学具有灵活性,但现有方法缺乏可解释性或依赖先验假设。基于模型的方法需要先验知识和人工设计。我们通过引入以下方法弥合这一差距:(1) 注意力广播解码器(ABCD),一种用于基于自编码器的潜在动力学学习的即插即用模块,可生成像素级精度的注意力图,定位每个潜在维度的贡献,同时过滤静态背景,通过空间基底的潜在变量和图像叠加实现视觉可解释性。(2) 视觉振荡器网络(VONs),一种二维潜在振荡器网络,与ABCD注意力图耦合,用于在图像上可视化学习到的质量、耦合刚度和力,从而实现机械可解释性。我们在单段和双段SCR上验证了该方法,证明基于ABCD的模型显著提高了多步预测精度,在双段机器人上,Koopman算子的误差降低了5.8倍,振荡器网络的误差降低了3.5倍。VONs自主发现了振荡器的链式结构。这种完全数据驱动的方法生成了紧凑且具有机械可解释性的模型,对未来控制应用具有潜在意义。