We propose EventHub, a novel framework for training deep-event stereo networks without ground truth annotations from costly active sensors, relying instead on standard color images. From these images, we derive either proxy annotations and proxy events through state-of-the-art novel view synthesis techniques, or simply proxy annotations when images are already paired with event data. Using the training set generated by our data factory, we repurpose state-of-the-art stereo models from RGB literature to process event data, obtaining new event stereo models with unprecedented generalization capabilities. Experiments on widely used event stereo datasets support the effectiveness of EventHub and show how the same data distillation mechanism can improve the accuracy of RGB stereo foundation models in challenging conditions such as nighttime scenes.
翻译:我们提出EventHub——一种无需代价高昂的主动传感器真值标注、仅依赖标准彩色图像即可训练深度事件立体视觉网络的全新框架。通过最先进的新视角合成技术,我们能从这些图像中推导出代理标注与代理事件;当图像与事件数据已配对时,则可仅推导代理标注。利用该数据工厂生成的训练集,我们改造了来自RGB文献的先进立体视觉模型以处理事件数据,从而获得具有前所未有泛化能力的新型事件立体视觉模型。在广泛使用的事件立体视觉数据集上的实验验证了EventHub的有效性,同时表明:相同的数据蒸馏机制可提升RGB立体视觉基础模型在夜间场景等挑战性条件下的精度。