Image-based depth estimation has gained significant attention in recent research on computer vision for autonomous vehicles in intelligent transportation systems. This focus stems from its cost-effectiveness and wide range of potential applications. Unlike binocular depth estimation methods that require two fixed cameras, monocular depth estimation methods only rely on a single camera, making them highly versatile. While state-of-the-art approaches for this task leverage self-supervised learning of deep neural networks in conjunction with tasks like pose estimation and semantic segmentation, none of them have explored the combination of federated learning and self-supervision to train models using unlabeled and private data captured by autonomous vehicles. The utilization of federated learning offers notable benefits, including enhanced privacy protection, reduced network consumption, and improved resilience to connectivity issues. To address this gap, we propose FedSCDepth, a novel method that combines federated learning and deep self-supervision to enable the learning of monocular depth estimators with comparable effectiveness and superior efficiency compared to the current state-of-the-art methods. Our evaluation experiments conducted on Eigen's Split of the KITTI dataset demonstrate that our proposed method achieves near state-of-the-art performance, with a test loss below 0.13 and requiring, on average, only 1.5k training steps and up to 0.415 GB of weight data transfer per autonomous vehicle on each round.
翻译:基于图像的单目深度估计在智能交通系统中自动驾驶车辆的计算机视觉研究中引起了广泛关注,这种关注源于其成本效益和广泛的应用潜力。与需要两个固定摄像头的双目深度估计方法不同,单目深度估计方法仅依赖单个摄像头,因而具有高度的灵活性。虽然当前最先进的方法利用深度神经网络的自监督学习,并联合姿态估计和语义分割等任务,但没有任何研究探索联邦学习与自监督的结合,以利用自动驾驶车辆采集的无标签和私有数据训练模型。联邦学习的应用提供了显著优势,包括增强的隐私保护、降低的网络消耗以及改善的连接鲁棒性。为填补这一空白,我们提出FedSCDepth方法,这是一种结合联邦学习与深度自监督的新颖方法,能够以与当前最先进方法相当的有效性且更高的效率学习单目深度估计器。我们在KITTI数据集的Eigen划分上进行的评估实验表明,所提方法达到了接近最先进的性能,测试损失低于0.13,平均每辆自动驾驶车辆每轮仅需1.5k训练步数,且权重数据传输量至多0.415 GB。