Existing view-based methods excel at recognizing 3D objects from predefined viewpoints, but their exploration of recognition under arbitrary views is limited. This is a challenging and realistic setting because each object has different viewpoint positions and quantities, and their poses are not aligned. However, most view-based methods, which aggregate multiple view features to obtain a global feature representation, hard to address 3D object recognition under arbitrary views. Due to the unaligned inputs from arbitrary views, it is challenging to robustly aggregate features, leading to performance degradation. In this paper, we introduce a novel Part-aware Network (PANet), which is a part-based representation, to address these issues. This part-based representation aims to localize and understand different parts of 3D objects, such as airplane wings and tails. It has properties such as viewpoint invariance and rotation robustness, which give it an advantage in addressing the 3D object recognition problem under arbitrary views. Our results on benchmark datasets clearly demonstrate that our proposed method outperforms existing view-based aggregation baselines for the task of 3D object recognition under arbitrary views, even surpassing most fixed viewpoint methods.
翻译:现有基于视角的方法在预定义视角下的三维物体识别方面表现出色,但其对任意视角下识别能力的探索仍显不足。这是一个具有挑战性且符合实际需求的设定,因为每个物体具有不同的视角位置与数量,且其姿态未经对齐。然而,大多数基于视角的方法通过聚合多视角特征来获取全局特征表示,难以有效处理任意视角下的三维物体识别问题。由于来自任意视角的输入数据未经对齐,实现鲁棒的特征聚合具有挑战性,这会导致性能下降。本文提出一种新颖的部件感知网络(PANet),该网络采用基于部件的表示方法以解决上述问题。这种基于部件的表示旨在对三维物体的不同部件(如飞机机翼与尾翼)进行定位与理解。其具备视角不变性与旋转鲁棒性等特性,在处理任意视角下的三维物体识别问题时具有显著优势。我们在基准数据集上的实验结果表明:针对任意视角下的三维物体识别任务,所提出的方法明显优于现有的基于视角聚合的基线方法,甚至超越了大多数固定视角方法。