StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation

Stereo depth estimation is fundamental to underwater robotic perception, yet suffers from severe domain shifts caused by wavelength-dependent light attenuation, scattering, and refraction. Recent approaches leverage monocular foundation models with GRU-based iterative refinement for underwater adaptation; however, the sequential gating and local convolutional kernels in GRUs necessitate multiple iterations for long-range disparity propagation, limiting performance in large-disparity and textureless underwater regions. In this paper, we propose StereoAdapter-2, which replaces the conventional ConvGRU updater with a novel ConvSS2D operator based on selective state space models. The proposed operator employs a four-directional scanning strategy that naturally aligns with epipolar geometry while capturing vertical structural consistency, enabling efficient long-range spatial propagation within a single update step at linear computational complexity. Furthermore, we construct UW-StereoDepth-80K, a large-scale synthetic underwater stereo dataset featuring diverse baselines, attenuation coefficients, and scattering parameters through a two-stage generative pipeline combining semantic-aware style transfer and geometry-consistent novel view synthesis. Combined with dynamic LoRA adaptation inherited from StereoAdapter, our framework achieves state-of-the-art zero-shot performance on underwater benchmarks with 17% improvement on TartanAir-UW and 7.2% improvment on SQUID, with real-world validation on the BlueROV2 platform demonstrates the robustness of our approach. Code: https://github.com/AIGeeksGroup/StereoAdapter-2. Website: https://aigeeksgroup.github.io/StereoAdapter-2.

翻译：立体深度估计是水下机器人感知的基础，但受波长相关的光衰减、散射和折射影响，存在严重的域偏移问题。现有方法利用基于GRU的迭代优化单目基础模型进行水下适应；然而，GRU中的序列门控与局部卷积核需要多次迭代才能实现长距离视差传播，限制了在大视差及无纹理水下区域的性能。本文提出StereoAdapter-2，将传统ConvGRU更新器替换为基于选择性状态空间模型的新型ConvSS2D算子。该算子采用四向扫描策略，既自然对齐极线几何，又能捕获垂直结构一致性，从而以线性计算复杂度在单次更新步骤内实现高效的长距离空间传播。此外，我们通过结合语义感知风格迁移与几何一致新视角合成的两阶段生成流程，构建了UW-StereoDepth-80K——一个包含多样化基线、衰减系数与散射参数的大规模合成水下立体数据集。结合继承自StereoAdapter的动态LoRA适应机制，我们的框架在水下基准测试中实现了零样本状态最优性能：在TartanAir-UW上提升17%，在SQUID上提升7.2%，基于BlueROV2平台的真实场景验证进一步证明了方法的鲁棒性。代码：https://github.com/AIGeeksGroup/StereoAdapter-2。项目网站：https://aigeeksgroup.github.io/StereoAdapter-2。