Pursuing Counterfactual Fairness via Sequential Autoencoder Across Domains

Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains.

翻译：认识到领域偏移作为机器学习中的常见挑战，各类领域泛化（DG）技术已被开发用于提升系统处理分布外（OOD）数据时的性能。此外，在现实场景中，数据分布可能随序贯领域序列逐渐变化。现有方法虽主要聚焦于提升模型在新领域中的有效性，却常忽视学习过程中的公平性问题。为此，我们提出创新框架——基于序贯自编码器的反事实公平感知领域泛化（CDSAE）。该方法将环境信息与敏感属性从分类特征的嵌入表示中有效分离。这种同步分离不仅显著提升模型在多样陌生领域的泛化能力，还可有效应对不公平分类挑战。该策略根植于因果推断原理以解决上述双重问题。为探究语义信息、敏感属性与环境线索间的复杂关系，我们将外生不确定性因素系统划分为四类潜变量：1）受敏感属性影响的语义信息，2）不受敏感属性影响的语义信息，3）受敏感属性影响的环境线索，4）不受敏感属性影响的环境线索。通过引入公平正则化，我们仅采用语义信息进行分类决策。基于合成数据集与真实数据集的实验验证了该方法有效性，证明其在持续演变的领域环境中既能提升准确率，又可确保公平性得以保留。