A New Dataset and Framework for Robust Road Surface Classification via Camera-IMU Fusion

Willams de Lima Costa,Thifany Ketuli Silva de Souza,Jonas Ferreira Silva,Carlos Gabriel Bezerra Pereira,Bruno Reis Vila Nova,Leonardo Silvino Brito,Rafael Raider Leoni,Juliano Silva,Valter Ferreira,Sibele Miguel Soares Neto,Samantha Uehara,Daniel Giacomo,João Marcelo Teixeira,Veronica Teichrieb,Cristiano Coelho de Araújo

Road surface classification (RSC) is a key enabler for environment-aware predictive maintenance systems. However, existing RSC techniques often fail to generalize beyond narrow operational conditions due to limited sensing modalities and datasets that lack environmental diversity. This work addresses these limitations by introducing a multimodal framework that fuses images and inertial measurements using a lightweight bidirectional cross-attention module followed by an adaptive gating layer that adjusts modality contributions under domain shifts. Given the limitations of current benchmarks, especially regarding lack of variability, we introduce ROAD, a new dataset composed of three complementary subsets: (i) real-world multimodal recordings with RGB-IMU streams synchronized using a gold-standard industry datalogger, captured across diverse lighting, weather, and surface conditions; (ii) a large vision-only subset designed to assess robustness under adverse illumination and heterogeneous capture setups; and (iii) a synthetic subset generated to study out-of-distribution generalization in scenarios difficult to obtain in practice. Experiments show that our method achieves a +1.4 pp improvement over the previous state-of-the-art on the PVS benchmark and an +11.6 pp improvement on our multimodal ROAD subset, with consistently higher F1-scores on minority classes. The framework also demonstrates stable performance across challenging visual conditions, including nighttime, heavy rain, and mixed-surface transitions. These findings indicate that combining affordable camera and IMU sensors with multimodal attention mechanisms provides a scalable, robust foundation for road surface understanding, particularly relevant for regions where environmental variability and cost constraints limit the adoption of high-end sensing suites.

翻译：路面分类（RSC）是实现环境感知预测性维护系统的关键使能技术。然而，由于传感模态有限且现有数据集缺乏环境多样性，当前的路面分类技术往往难以在狭窄的操作条件之外实现泛化。本研究通过引入一种多模态框架来解决这些局限性，该框架使用轻量级双向交叉注意力模块融合图像与惯性测量数据，并采用自适应门控层在域偏移条件下调整各模态的贡献。鉴于当前基准测试（尤其是其多样性不足）的局限性，我们提出了ROAD数据集，该数据集由三个互补的子集构成：（i）使用工业级标准数据记录器同步采集的真实世界多模态记录（包含RGB-IMU数据流），覆盖多样化的光照、天气与路面条件；（ii）一个大规模纯视觉子集，旨在评估恶劣光照与异构采集设置下的鲁棒性；（iii）一个合成生成的子集，用于研究在现实中难以获取的场景下的分布外泛化能力。实验表明，我们的方法在PVS基准测试上相比先前最优性能提升了1.4个百分点，并在我们提出的多模态ROAD子集上提升了11.6个百分点，同时在少数类别上持续获得更高的F1分数。该框架在夜间、暴雨及混合路面过渡等具有挑战性的视觉条件下也表现出稳定的性能。这些结果表明，将经济型相机与IMU传感器结合多模态注意力机制，为路面理解提供了一个可扩展且鲁棒的基础方案，尤其适用于环境多变且成本受限而难以采用高端传感套件的地区。