3DRot：重新发现基于RGB的3D增强中缺失的基本操作 (3DRot: Rediscovering the Missing Primitive for RGB-Based 3D Augmentation)

RGB-based 3D tasks, e.g., 3D detection, depth estimation, 3D keypoint estimation, still suffer from scarce, expensive annotations and a thin augmentation toolbox, since many image transforms, including rotations and warps, disrupt geometric consistency. While horizontal flipping and color jitter are standard, rigorous 3D rotation augmentation has surprisingly remained absent from RGB-based pipelines, largely due to the misconception that it requires scene depth or scene reconstruction. In this paper, we introduce 3DRot, a plug-and-play augmentation that rotates and mirrors images about the camera's optical center while synchronously updating RGB images, camera intrinsics, object poses, and 3D annotations to preserve projective geometry, achieving geometry-consistent rotations and reflections without relying on any scene depth. We first validate 3DRot on a classical RGB-based 3D task, monocular 3D detection. On SUN RGB-D, inserting 3DRot into a frozen DINO-X + Cube R-CNN pipeline raises $IoU_{3D}$ from 43.21 to 44.51, cuts rotation error (ROT) from 22.91$^\circ$ to 20.93$^\circ$, and boosts $mAP_{0.5}$ from 35.70 to 38.11; smaller but consistent gains appear on a cross-domain IN10 split. Beyond monocular detection, adding 3DRot on top of the standard BTS augmentation schedule further improves NYU Depth v2 from 0.1783 to 0.1685 in abs-rel (and 0.7472 to 0.7548 in $δ<1.25$), and reduces cross-dataset error on SUN RGB-D. On KITTI, applying the same camera-centric rotations in MVX-Net (LiDAR+RGB) raises moderate 3D AP from about 63.85 to 65.16 while remaining compatible with standard 3D augmentations.

翻译：基于RGB的3D任务（如3D检测、深度估计、3D关键点估计）仍受限于标注数据稀缺昂贵且增强工具集薄弱的问题，因为包括旋转和扭曲在内的许多图像变换会破坏几何一致性。虽然水平翻转和颜色抖动已成为标准操作，但严格的3D旋转增强却令人惊讶地长期缺席于基于RGB的流程，这主要源于认为其需要场景深度或场景重建的误解。本文提出3DRot——一种即插即用的增强方法，它围绕相机光心旋转和镜像图像，同时同步更新RGB图像、相机内参、物体位姿和3D标注以保持投影几何关系，从而在不依赖任何场景深度的情况下实现几何一致的旋转与反射。我们首先在经典的单目3D检测任务上验证3DRot。在SUN RGB-D数据集上，将3DRot插入冻结的DINO-X + Cube R-CNN流程中，可将$IoU_{3D}$从43.21提升至44.51，将旋转误差（ROT）从22.91$^\circ$降低至20.93$^\circ$，并将$mAP_{0.5}$从35.70提升至38.11；在跨域IN10划分上也出现了较小但一致的增益。除单目检测外，在标准BTS增强方案基础上叠加3DRot，进一步将NYU Depth v2的abs-rel指标从0.1783改善至0.1685（同时将$δ<1.25$指标从0.7472提升至0.7548），并降低了在SUN RGB-D上的跨数据集误差。在KITTI数据集上，对MVX-Net（LiDAR+RGB）应用相同的以相机为中心的旋转操作，可将中等难度3D AP从约63.85提升至65.16，同时保持与标准3D增强方法的兼容性。