We present LightStereo, a cutting-edge stereo-matching network crafted to accelerate the matching process. Departing from conventional methodologies that rely on aggregating computationally intensive 4D costs, LightStereo adopts the 3D cost volume as a lightweight alternative. While similar approaches have been explored previously, our breakthrough lies in enhancing performance through a dedicated focus on the channel dimension of the 3D cost volume, where the distribution of matching costs is encapsulated. Our exhaustive exploration has yielded plenty of strategies to amplify the capacity of the pivotal dimension, ensuring both precision and efficiency. We compare the proposed LightStereo with existing state-of-the-art methods across various benchmarks, which demonstrate its superior performance in speed, accuracy, and resource utilization. LightStereo achieves a competitive EPE metric in the SceneFlow datasets while demanding a minimum of only 22 GFLOPs, with an inference time of just 17 ms. Our comprehensive analysis reveals the effect of 2D cost aggregation for stereo matching, paving the way for real-world applications of efficient stereo systems. Code will be available at \url{https://github.com/XiandaGuo/OpenStereo}.
翻译:本文提出LightStereo,一种旨在加速匹配过程的先进立体匹配网络。不同于传统方法依赖计算密集的四维代价聚合,LightStereo采用三维代价体作为轻量级替代方案。尽管已有类似方法被探索,但我们的突破在于通过专注于三维代价体的通道维度(该维度封装了匹配代价的分布)来提升性能。我们通过详尽探索提出了多种增强该关键维度容量的策略,确保了精度与效率的平衡。我们将所提出的LightStereo与现有先进方法在多个基准测试上进行比较,结果证明其在速度、精度和资源利用率方面均具有优越性能。LightStereo在SceneFlow数据集上取得了具有竞争力的EPE指标,同时仅需最低22 GFLOPs的计算量,推理时间仅为17毫秒。我们的综合分析揭示了二维代价聚合对立体匹配的影响,为高效立体系统的实际应用铺平了道路。代码将在\url{https://github.com/XiandaGuo/OpenStereo}发布。