Openable part detection is the task of detecting the openable parts of an object in a single-view image, and predicting corresponding motion parameters. Prior work investigated the unrealistic setting where all input images only contain a single openable object. We generalize this task to scenes with multiple objects each potentially possessing openable parts, and create a corresponding dataset based on real-world scenes. We then address this more challenging scenario with OPDFormer: a part-aware transformer architecture. Our experiments show that the OPDFormer architecture significantly outperforms prior work. The more realistic multiple-object scenarios we investigated remain challenging for all methods, indicating opportunities for future work.
翻译:可开合部件检测是指在单视角图像中检测物体的可开合部件,并预测对应的运动参数。先前的研究基于一个不切实际的假设,即所有输入图像中仅包含单个可开合物体。本研究将该任务推广至可能包含多个具有可开合部件的物体的场景,并基于真实世界场景创建了相应的数据集。随后,我们通过引入部件感知变换器架构OPDFormer来应对这一更具挑战性的场景。实验结果表明,OPDFormer架构显著优于先前的工作。我们所研究的更真实的多物体场景对所有方法而言仍具挑战性,这也为未来的研究提供了契机。