3D assets have rapidly expanded in quantity and diversity due to the growing popularity of virtual reality and gaming. As a result, text-to-shape retrieval has become essential in facilitating intuitive search within large repositories. However, existing methods require canonical poses and support few object categories, limiting their real-world applicability where objects can belong to diverse classes and appear in random orientations. To address this challenge, we propose RI-Mamba, the first rotation-invariant state-space model for point clouds. RI-Mamba defines global and local reference frames to disentangle pose from geometry and uses Hilbert sorting to construct token sequences with meaningful geometric structure while maintaining rotation invariance. We further introduce a novel strategy to compute orientational embeddings and reintegrate them via feature-wise linear modulation, effectively recovering spatial context and enhancing model expressiveness. Our strategy is inherently compatible with state-space models and operates in linear time. To scale up retrieval, we adopt cross-modal contrastive learning with automated triplet generation, allowing training on diverse datasets without manual annotation. Extensive experiments demonstrate RI-Mamba's superior representational capacity and robustness, achieving state-of-the-art performance on the OmniObject3D benchmark across more than 200 object categories under arbitrary orientations. Our code will be made available at https://github.com/ndkhanh360/RI-Mamba.git.
翻译:随着虚拟现实和游戏产业的快速发展,三维资产在数量和多样性上迅速增长。因此,文本到形状检索技术对于促进大型存储库中的直观搜索变得至关重要。然而,现有方法通常要求规范姿态且仅支持少数对象类别,这限制了其在现实场景中的应用——现实中物体可能属于多样类别并以随机方向出现。为解决这一挑战,我们提出了RI-Mamba,首个面向点云的旋转不变状态空间模型。RI-Mamba通过定义全局与局部参考系来解耦姿态与几何特征,并采用希尔伯特排序构建具有几何意义且保持旋转不变性的令牌序列。我们进一步提出一种新颖的方向嵌入计算策略,通过特征级线性调制实现方向信息的重融合,有效恢复空间上下文并增强模型表达能力。该策略天然兼容状态空间模型,且具有线性时间复杂度。为扩展检索规模,我们采用基于自动三元组生成的跨模态对比学习方法,可在无需人工标注的情况下利用多样化数据集进行训练。大量实验证明RI-Mamba具有卓越的表征能力和鲁棒性,在OmniObject3D基准测试中针对200多个物体类别在任意取向下均实现了最先进的性能。代码已开源:https://github.com/ndkhanh360/RI-Mamba.git。