High-resolution geometric prediction is essential for robust perception in autonomous driving, robotics, and AR/MR, but current foundation models are fundamentally limited by their scalability to real-world, high-resolution scenarios. Direct inference on 2K images with these models incurs prohibitive computational and memory demands, making practical deployment challenging. To tackle the issue, we present 2K Retrofit, a novel framework that enables efficient 2K-resolution inference for any geometric foundation model, without modifying or retraining the backbone. Our approach leverages fast coarse predictions and an entropy-based sparse refinement to selectively enhance high-uncertainty regions, achieving precise and high-fidelity 2K outputs with minimal overhead. Extensive experiments on widely used benchmark demonstrate that 2K Retrofit consistently achieves state-of-the-art accuracy and speed, bridging the gap between research advances and scalable deployment in high-resolution 3D vision applications. Code will be released upon acceptance.
翻译:高分辨率几何预测对于自动驾驶、机器人以及增强现实/混合现实中的稳健感知至关重要,但当前的基础模型因其在真实世界高分辨率场景中的可扩展性而受到根本限制。直接对这些模型输入2K图像进行推理会带来难以承受的计算和内存需求,使得实际部署面临挑战。为解决这一问题,我们提出了2K Retrofit,一个新颖的框架,它能实现任何几何基础模型的高效2K分辨率推理,而无需修改或重新训练主干网络。我们的方法利用快速的粗粒度预测和基于熵的稀疏优化,选择性地增强高不确定性区域,以最小的开销实现精确且高保真的2K输出。在广泛使用的基准测试上进行的大量实验表明,2K Retrofit始终如一地达到了最先进的精度和速度,弥合了研究进展与高分辨率三维视觉应用可扩展部署之间的差距。代码将在论文被接收后发布。