Automatic image rotation estimation is a key preprocessing step in many vision pipelines. This task is challenging because angles have circular topology, creating boundary discontinuities that hinder standard regression methods. We present a comprehensive study of five circular-aware methods for global orientation estimation: direct angle regression with circular loss, classification via angular binning, unit-vector regression, phase-shifting coder, and circular Gaussian distribution. Using transfer learning from ImageNet-pretrained models, we systematically evaluate these methods across sixteen modern architectures by adapting their output heads for rotation-specific predictions. Our results show that probabilistic methods, particularly the circular Gaussian distribution, are the most robust across architectures, while classification achieves the best accuracy on well-matched backbones but suffers training instabilities on others. The best configuration (classification with EfficientViT-B3) achieves a mean absolute error (MAE) of 1.23° (mean across five independent runs) on the DRC-D dataset, while the circular Gaussian distribution with MambaOut Base achieves a virtually identical 1.24° with greater robustness across backbones. Training and evaluating our top-performing method-architecture combinations on COCO 2014, the best configuration reaches 3.71° MAE, improving substantially over prior work, with further improvement to 2.84° on the larger COCO 2017 dataset.
翻译:自动图像旋转估计是许多视觉流水线中的关键预处理步骤。该任务具有挑战性,因为角度具有循环拓扑结构,会产生边界不连续性,从而阻碍标准回归方法。我们对五种全局方向估计的循环感知方法进行了全面研究:采用循环损失的直接角度回归、基于角度分箱的分类、单位向量回归、相移编码器以及循环高斯分布。通过利用ImageNet预训练模型的迁移学习,我们系统评估了这五种方法在十六种现代架构上的表现,方法是通过调整其输出头以进行旋转特定的预测。结果表明,概率方法(尤其是循环高斯分布)在不同架构中最为稳健,而分类方法在匹配良好的骨干网络上能达到最佳精度,但在其他网络上会出现训练不稳定性。最佳配置(EfficientViT-B3分类方法)在DRC-D数据集上的平均绝对误差(MAE)为1.23°(五次独立运行的平均值),而使用MambaOut Base骨干网络的循环高斯分布方法实现了几乎相同的1.24° MAE,且在跨骨干网络上表现出更强的鲁棒性。将我们的最佳方法-架构组合在COCO 2014数据集上进行训练和评估,最佳配置的MAE达到3.71°,较先前工作显著提升,而在更大的COCO 2017数据集上进一步改进至2.84°。