In this paper, we investigate a scenario in which a robot learns a low-dimensional representation of a door given a video of the door opening or closing. This representation can be used to infer door-related parameters and predict the outcomes of interacting with the door. Current machine learning based approaches in the doors domain are based primarily on labelled datasets. However, the large quantity of available door data suggests the feasibility of a semisupervised approach based on pretraining. To exploit the hierarchical structure of the dataset where each door has multiple associated images, we pretrain with a structured latent variable model known as a neural statistician. The neural satsitician enforces separation between shared context-level variables (common across all images associated with the same door) and instance-level variables (unique to each individual image). We first demonstrate that the neural statistician is able to learn an embedding that enables reconstruction and sampling of realistic door images. Then, we evaluate the correspondence of the learned embeddings to human-interpretable parameters in a series of supervised inference tasks. It was found that a pretrained neural statistician encoder outperformed analogous context-free baselines when predicting door handedness, size, angle location, and configuration from door images. Finally, in a visual bandit door-opening task with a variety of door configuration, we found that neural statistician embeddings achieve lower regret than context-free baselines.
翻译:本文研究了一个场景:机器人通过观察门的开合视频学习其低维表征。该表征可用于推断门的相关参数,并预测与门交互的结果。当前基于机器学习的门域方法主要依赖标注数据集。然而,大量可获得的门数据表明,基于预训练的半监督方法具有可行性。为利用数据集的层次结构(每扇门对应多张关联图像),我们采用称为神经统计师的结构化潜变量模型进行预训练。该模型强制分离共享的上下文级变量(同一扇门所有关联图像的共同特征)和实例级变量(每张图像的独有特征)。我们首先证明神经统计师能够学习到可重建逼真门图像并对其采样的嵌入表征。随后,通过一系列有监督推理任务评估所学嵌入与人类可解释参数的对应关系。实验发现,在基于门图像预测门的左右开向、尺寸、角度位置和配置时,预训练的神经统计师编码器优于同类无上下文基准模型。最后,在包含多种门配置的视觉臂式门打开任务中,神经统计师嵌入实现了比无上下文基准更低的累积遗憾值。