Relating Events and Frames Based on Self-Supervised Learning and Uncorrelated Conditioning for Unsupervised Domain Adaptation

Event-based cameras provide accurate and high temporal resolution measurements for performing computer vision tasks in challenging scenarios, such as high-dynamic range environments and fast-motion maneuvers. Despite their advantages, utilizing deep learning for event-based vision encounters a significant obstacle due to the scarcity of annotated data caused by the relatively recent emergence of event-based cameras. To overcome this limitation, leveraging the knowledge available from annotated data obtained with conventional frame-based cameras presents an effective solution based on unsupervised domain adaptation. We propose a new algorithm tailored for adapting a deep neural network trained on annotated frame-based data to generalize well on event-based unannotated data. Our approach incorporates uncorrelated conditioning and self-supervised learning in an adversarial learning scheme to close the gap between the two source and target domains. By applying self-supervised learning, the algorithm learns to align the representations of event-based data with those from frame-based camera data, thereby facilitating knowledge transfer.Furthermore, the inclusion of uncorrelated conditioning ensures that the adapted model effectively distinguishes between event-based and conventional data, enhancing its ability to classify event-based images accurately.Through empirical experimentation and evaluation, we demonstrate that our algorithm surpasses existing approaches designed for the same purpose using two benchmarks. The superior performance of our solution is attributed to its ability to effectively utilize annotated data from frame-based cameras and transfer the acquired knowledge to the event-based vision domain.

翻译：事件相机能够在高动态范围环境和快速运动等挑战性场景中提供高精度、高时间分辨率的计算机视觉测量。尽管具有这些优势，但由于事件相机出现时间相对较短导致标注数据匮乏，利用深度学习进行事件视觉处理面临重大障碍。为克服这一局限，基于无监督域适应方法，利用传统帧相机获取的标注数据知识成为一种有效解决方案。我们提出一种新算法，专门用于训练基于标注帧数据的深度神经网络，使其能够很好地泛化到未标注的事件数据。该算法在对抗学习框架中融合了非相关条件约束与自监督学习，以缩小源域与目标域之间的差距。通过自监督学习，算法能够对齐事件数据与帧相机数据的表征，从而促进知识迁移。此外，非相关条件约束的引入确保适应后的模型有效区分事件数据与传统帧数据，提升对事件图像分类的准确性。通过实验评估，我们证明该算法在两个基准测试中均优于现有同类方法。其优越性能归因于能够有效利用帧相机标注数据，并将获取的知识迁移至事件视觉领域。