CLIDD: Cross-Layer Independent Deformable Description for Efficient and Discriminative Local Feature Representation

Robust local feature representations are essential for spatial intelligence tasks such as robot navigation and augmented reality. Establishing reliable correspondences requires descriptors that provide both high discriminative power and computational efficiency. To address this, we introduce Cross-Layer Independent Deformable Description (CLIDD), a method that achieves superior distinctiveness by sampling directly from independent feature hierarchies. This approach utilizes learnable offsets to capture fine-grained structural details across scales while bypassing the computational burden of unified dense representations. To ensure real-time performance, we implement a hardware-aware kernel fusion strategy that maximizes inference throughput. Furthermore, we develop a scalable framework that integrates lightweight architectures with a training protocol leveraging both metric learning and knowledge distillation. This scheme generates a wide spectrum of model variants optimized for diverse deployment constraints. Extensive evaluations demonstrate that our approach achieves superior matching accuracy and exceptional computational efficiency simultaneously. Specifically, the ultra-compact variant matches the precision of SuperPoint while utilizing only 0.004M parameters, achieving a 99.7% reduction in model size. Furthermore, our high-performance configuration outperforms all current state-of-the-art methods, including high-capacity DINOv2-based frameworks, while exceeding 200 FPS on edge devices. These results demonstrate that CLIDD delivers high-precision local feature matching with minimal computational overhead, providing a robust and scalable solution for real-time spatial intelligence tasks.

翻译：鲁棒的局部特征表示对于机器人导航和增强现实等空间智能任务至关重要。建立可靠的对应关系需要描述符同时具备高判别力与计算效率。为此，我们提出跨层独立可变形描述（CLIDD）方法，该方法通过直接从独立的特征层次中进行采样，实现了卓越的区分性。该方案利用可学习偏移量来捕捉跨尺度的细粒度结构细节，同时规避了统一稠密表示的计算负担。为确保实时性能，我们实现了硬件感知的核融合策略以最大化推理吞吐量。此外，我们开发了一个可扩展框架，将轻量级架构与融合度量学习和知识蒸馏的训练协议相结合。该方案生成了一系列针对不同部署约束优化的模型变体。大量实验表明，我们的方法同时实现了卓越的匹配精度与突出的计算效率。具体而言，超紧凑变体在仅使用0.004M参数的情况下达到了与SuperPoint相当的精度，模型尺寸减少了99.7%。此外，我们的高性能配置超越了所有当前最先进方法（包括基于高容量DINOv2的框架），并在边缘设备上实现了超过200 FPS的推理速度。这些结果表明，CLIDD能够以极低计算开销实现高精度局部特征匹配，为实时空间智能任务提供了鲁棒且可扩展的解决方案。