This paper presents a novel approach for representing proprioceptive time-series data from quadruped robots as structured two-dimensional images, enabling the use of convolutional neural networks for learning locomotion-related tasks. The proposed method encodes temporal dynamics from multiple proprioceptive signals, such as joint positions, IMU readings, and foot velocities, while preserving the robot's morphological structure in the spatial arrangement of the image. This transformation captures inter-signal correlations and gait-dependent patterns, providing a richer feature space than direct time-series processing. We apply this concept in the problem of contact estimation, a key capability for stable and adaptive locomotion on diverse terrains. Experimental evaluations on both real-world datasets and simulated environments show that our image-based representation consistently enhances prediction accuracy and generalization over conventional sequence-based models, underscoring the potential of cross-modal encoding strategies for robotic state learning. Our method achieves superior performance on the contact dataset, improving contact state accuracy from 87.7% to 94.5% over the recently proposed MI-HGNN method, using a 15 times shorter window size.
翻译:本文提出了一种将四足机器人本体感知时间序列数据表示为结构化二维图像的新方法,从而能够利用卷积神经网络学习与运动控制相关的任务。所提方法通过编码关节位置、惯性测量单元读数及足部速度等多个本体感知信号的时间动态特征,并在图像的空间排列中保留机器人的形态结构。该变换能够捕获信号间的相关性以及步态依赖模式,相比直接处理时间序列提供了更丰富的特征空间。我们将此概念应用于接触估计问题——这是实现机器人在复杂地形上稳定与自适应运动的关键能力。在真实数据集和模拟环境上的实验评估表明,相较于传统基于序列的模型,基于图像的表示方法在预测精度和泛化能力上均获得持续提升,凸显了跨模态编码策略在机器人状态学习领域的潜力。该方法在接触数据集上取得了优异性能,相较于近期提出的MI-HGNN方法,在窗口长度缩短15倍的条件下,将接触状态准确率从87.7%提升至94.5%。