Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges

With the explosive 3D data growth, the urgency of utilizing zero-shot learning to facilitate data labeling becomes evident. Recently, methods transferring language or language-image pre-training models like Contrastive Language-Image Pre-training (CLIP) to 3D vision have made significant progress in the 3D zero-shot classification task. These methods primarily focus on 3D object classification with an aligned pose; such a setting is, however, rather restrictive, which overlooks the recognition of 3D objects with open poses typically encountered in real-world scenarios, such as an overturned chair or a lying teddy bear. To this end, we propose a more realistic and challenging scenario named open-pose 3D zero-shot classification, focusing on the recognition of 3D objects regardless of their orientation. First, we revisit the current research on 3D zero-shot classification, and propose two benchmark datasets specifically designed for the open-pose setting. We empirically validate many of the most popular methods in the proposed open-pose benchmark. Our investigations reveal that most current 3D zero-shot classification models suffer from poor performance, indicating a substantial exploration room towards the new direction. Furthermore, we study a concise pipeline with an iterative angle refinement mechanism that automatically optimizes one ideal angle to classify these open-pose 3D objects. In particular, to make validation more compelling and not just limited to existing CLIP-based methods, we also pioneer the exploration of knowledge transfer based on Diffusion models. While the proposed solutions can serve as a new benchmark for open-pose 3D zero-shot classification, we discuss the complexities and challenges of this scenario that remain for further research development. The code is available publicly at https://github.com/weiguangzhao/Diff-OP3D.

翻译：随着三维数据的爆炸式增长，利用零样本学习促进数据标注的紧迫性愈发凸显。近年来，将语言或语言-图像预训练模型（如对比语言-图像预训练（CLIP））迁移至三维视觉的方法，在三维零样本分类任务中取得了显著进展。这些方法主要关注对齐姿态下的三维物体分类；然而，这种设置存在较大局限，忽略了现实场景中常见物体（如翻转的椅子或躺卧的泰迪熊）的开放姿态识别问题。为此，我们提出一个更具现实意义和挑战性的场景——开放姿态三维零样本分类，旨在识别任意朝向的三维物体。首先，我们回顾当前三维零样本分类研究，并提出两个专为开放姿态场景设计的基准数据集。通过实验验证了众多流行方法在开放姿态基准上的表现，结果表明当前多数三维零样本分类模型性能欠佳，这一新方向仍有广阔探索空间。此外，我们研究了一种简洁的流程，通过迭代角度优化机制自动搜索最佳分类视角，以处理开放姿态三维物体。为增强验证的说服力且不局限于现有基于CLIP的方法，我们还首次探索了基于扩散模型的知识迁移。所提解决方案可作为开放姿态三维零样本分类的新基准，同时我们讨论了该场景中尚待研究的复杂性与挑战。代码已公开于https://github.com/weiguangzhao/Diff-OP3D。