3D shape completion from partial scans remains challenging for unseen categories and noisy real-world observations, where geometry alone is often insufficient for inferring missing structure. We present DinoComplete, a deterministic and efficient shape completion framework that augments geometric reconstruction with voxel-aligned semantic priors distilled from DINO features. First, we construct multi-view DINO feature volumes aligned with ShapeNet data and train a student network to predict dense semantic features directly from incomplete shapes. These predicted features capture global structure and part-aware semantic context while remaining aligned with the underlying geometry. We then integrate these distilled features into a completion network, where geometric and semantic voxel representations are fused through voxel state-space modeling. To enable efficient long-range reasoning without sacrificing resolution, we introduce a multi-scale voxel Mamba module that refines the fused features by combining full-grid and chunk-wise sequence modeling. Experiments on unseen ShapeNet categories and ScanNet objects show that DinoComplete achieves stronger completion quality than prior deterministic and generative based completion methods while using fewer parameters, requiring lower memory, and achieving faster inference. Our results demonstrate that distilling semantic priors from visual foundation models improves generalization and robustness in 3D shape completion.
翻译:基于局部扫描的三维形状补全在应对未见类别和含噪真实观测时仍面临挑战,在此类场景中仅凭几何信息往往难以推断缺失结构。本文提出DinoComplete——一种确定性的高效形状补全框架,通过融合从DINO特征蒸馏得到的体素对齐语义先验来增强几何重建。首先,我们构建与ShapeNet数据对齐的多视角DINO特征体,并训练学生网络直接从非完整形状预测稠密语义特征。这些预测特征在保持与底层几何对齐的同时,捕获了全局结构与部件感知语义上下文。随后将蒸馏特征集成至补全网络,通过体素状态空间建模融合几何与语义体素表征。为实现不牺牲分辨率的高效长程推理,我们提出多尺度体素Mamba模块,通过结合全网格与分块序列建模对融合特征进行精炼。在未见ShapeNet类别及ScanNet物体上的实验表明:相比先前确定性与生成式补全方法,DinoComplete在参数更少、内存需求更低、推理速度更快的条件下,实现了更优的补全质量。研究结果证明,从视觉基础模型蒸馏语义先验可提升三维形状补全的泛化能力与鲁棒性。