This paper contributes a novel learning-based method for aggressive task-driven compression of depth images and their encoding as images tailored to collision prediction for robotic systems. A novel 3D image processing methodology is proposed that accounts for the robot's size in order to appropriately "inflate" the obstacles represented in the depth image and thus obtain the distance that can be traversed by the robot in a collision-free manner along any given ray within the camera frustum. Such depth-and-collision image pairs are used to train a neural network that follows the architecture of Variational Autoencoders to compress-and-transform the information in the original depth image to derive a latent representation that encodes the collision information for the given depth image. We compare our proposed task-driven encoding method with classical task-agnostic methods and demonstrate superior performance for the task of collision image prediction from extremely low-dimensional latent spaces. A set of comparative studies show that the proposed approach is capable of encoding depth image-and-collision image tuples from complex scenes with thin obstacles at long distances better than the classical methods at compression ratios as high as 4050:1.
翻译:本文提出了一种新颖的基于学习方法,用于深度图像的激进式任务驱动压缩,并将其编码为专为机器人系统碰撞预测定制的图像。我们提出了一种新型3D图像处理方法,该方法考虑机器人尺寸,对深度图像中表示的障碍物进行适当“膨胀”,从而获得机器人在相机视锥内沿任意给定射线无碰撞可行驶的距离。此类深度-碰撞图像对用于训练一个遵循变分自编码器架构的神经网络,该网络对原始深度图像中的信息进行压缩与变换,推导出编码给定深度图像碰撞信息的潜在表示。我们将提出的任务驱动编码方法与经典的任务无关方法进行比较,并证明在极低维潜在空间中,对于碰撞图像预测任务具有更优性能。一系列对比研究表明,所提方法能够以高达4050:1的压缩比,在包含长距离薄障碍物的复杂场景中,比经典方法更好地编码深度图像-碰撞图像对。