Towards Robust and Fair Vision Learning in Open-World Environments

The dissertation presents four key contributions toward fairness and robustness in vision learning. First, to address the problem of large-scale data requirements, the dissertation presents a novel Fairness Domain Adaptation approach derived from two major novel research findings of Bijective Maximum Likelihood and Fairness Adaptation Learning. Second, to enable the capability of open-world modeling of vision learning, this dissertation presents a novel Open-world Fairness Continual Learning Framework. The success of this research direction is the result of two research lines, i.e., Fairness Continual Learning and Open-world Continual Learning. Third, since visual data are often captured from multiple camera views, robust vision learning methods should be capable of modeling invariant features across views. To achieve this desired goal, the research in this thesis will present a novel Geometry-based Cross-view Adaptation framework to learn robust feature representations across views. Finally, with the recent increase in large-scale videos and multimodal data, understanding the feature representations and improving the robustness of large-scale visual foundation models is critical. Therefore, this thesis will present novel Transformer-based approaches to improve the robust feature representations against multimodal and temporal data. Then, a novel Domain Generalization Approach will be presented to improve the robustness of visual foundation models. The research's theoretical analysis and experimental results have shown the effectiveness of the proposed approaches, demonstrating their superior performance compared to prior studies. The contributions in this dissertation have advanced the fairness and robustness of machine vision learning.

翻译：本论文针对视觉学习中的公平性与稳健性提出了四项关键贡献。首先，为解决大规模数据需求问题，论文提出了一种新颖的公平性域适应方法，该方法源于双射最大似然与公平性适应学习两项重要创新研究成果。其次，为实现视觉学习的开放世界建模能力，本文提出了一种新颖的开放世界公平性持续学习框架。该研究方向的成功得益于两条研究路径，即公平性持续学习与开放世界持续学习。第三，由于视觉数据通常从多摄像头视角采集，稳健的视觉学习方法应具备跨视角不变特征建模能力。为实现这一目标，本论文研究将提出一种基于几何的跨视角适应框架，以学习跨视角的稳健特征表示。最后，随着大规模视频与多模态数据的日益增长，理解特征表示并提升大规模视觉基础模型的稳健性至关重要。因此，本论文将提出基于Transformer的创新方法，以增强针对多模态与时序数据的稳健特征表示。随后，将提出一种新颖的域泛化方法以提升视觉基础模型的稳健性。研究的理论分析与实验结果表明了所提方法的有效性，其性能优于现有研究。本论文的贡献推动了机器视觉学习在公平性与稳健性方面的进步。