The predictive brain hypothesis suggests that perception can be interpreted as the process of minimizing the error between predicted perception tokens generated by an internal world model and actual sensory input tokens. When implementing working examples of this hypothesis in the context of in-air sonar, significant difficulties arise due to the sparse nature of the reflection model that governs ultrasonic sensing. Despite these challenges, creating consistent world models using sonar data is crucial for implementing predictive processing of ultrasound data in robotics. In an effort to enable robust robot behavior using ultrasound as the sole exteroceptive sensor modality, this paper introduces EchoPT, a pretrained transformer architecture designed to predict 2D sonar images from previous sensory data and robot ego-motion information. We detail the transformer architecture that drives EchoPT and compare the performance of our model to several state-of-the-art techniques. In addition to presenting and evaluating our EchoPT model, we demonstrate the effectiveness of this predictive perception approach in two robotic tasks.
翻译:预测脑假说表明,感知可被理解为最小化内部世界模型生成的预测感知标记与实际感官输入标记之间误差的过程。当在空气声纳场景中实现该假说的实际案例时,由于控制超声传感的反射模型具有稀疏特性,面临显著困难。尽管存在这些挑战,利用声纳数据构建一致的世界模型对于在机器人领域实现超声数据的预测处理至关重要。为实现以超声作为唯一外部传感器模态的鲁棒机器人行为,本文提出EchoPT——一种旨在根据历史传感器数据与机器人自运动信息预测二维声纳图像的预训练Transformer架构。我们详细阐述了驱动EchoPT的Transformer架构,并将该模型性能与多项先进技术进行对比。除展示与评估EchoPT模型外,我们还在两项机器人任务中验证了这种预测感知方法的有效性。