Rearranging objects (e.g. vase, door) back in their original positions is one of the most fundamental skills for domestic service robots (DSRs). In rearrangement tasks, it is crucial to detect the objects that need to be rearranged according to the goal and current states. In this study, we focus on Rearrangement Target Detection (RTD), where the model generates a change mask for objects that should be rearranged. Although many studies have been conducted in the field of Scene Change Detection (SCD), most SCD methods often fail to segment objects with complex shapes and fail to detect the change in the angle of objects that can be opened or closed. In this study, we propose a Co-Scale Cross-Attentional Transformer for RTD. We introduce the Serial Encoder which consists of a sequence of serial blocks and the Cross-Attentional Encoder which models the relationship between the goal and current states. We built a new dataset consisting of RGB images and change masks regarding the goal and current states. We validated our method on the dataset and the results demonstrated that our method outperformed baseline methods on $F_1$-score and mean IoU.
翻译:将物体(如花瓶、门)重新摆放到其原始位置是家用服务机器人(DSRs)最基本的能力之一。在重排任务中,根据目标状态与当前状态检测需要重新排列的物体至关重要。本研究聚焦于重排目标检测(RTD),即模型为需要重排的物体生成变化掩码。尽管场景变化检测(SCD)领域已有大量研究,但现有SCD方法往往难以分割复杂形状的物体,也无法检测可开合物体的角度变化。本研究提出一种用于RTD的共尺度交叉注意力Transformer模型。我们设计了由序列化模块构成的串行编码器,以及建模目标状态与当前状态间关系的交叉注意力编码器。我们构建了一个包含目标与当前状态RGB图像及变化掩码的新数据集。在该数据集上的验证结果表明,本方法在$F_1$分数和平均交并比指标上均优于基线方法。