Self-supervised learning (SSL) has become the de facto training paradigm of large models, where pre-training is followed by supervised fine-tuning using domain-specific data and labels. Despite demonstrating comparable performance with supervised methods, comprehensive efforts to assess SSL's impact on machine learning fairness (i.e., performing equally on different demographic breakdowns) are lacking. Hypothesizing that SSL models would learn more generic, hence less biased representations, this study explores the impact of pre-training and fine-tuning strategies on fairness. We introduce a fairness assessment framework for SSL, comprising five stages: defining dataset requirements, pre-training, fine-tuning with gradual unfreezing, assessing representation similarity conditioned on demographics, and establishing domain-specific evaluation processes. We evaluate our method's generalizability on three real-world human-centric datasets (i.e., MIMIC, MESA, and GLOBEM) by systematically comparing hundreds of SSL and fine-tuned models on various dimensions spanning from the intermediate representations to appropriate evaluation metrics. Our findings demonstrate that SSL can significantly improve model fairness, while maintaining performance on par with supervised methods-exhibiting up to a 30% increase in fairness with minimal loss in performance through self-supervision. We posit that such differences can be attributed to representation dissimilarities found between the best- and the worst-performing demographics across models-up to x13 greater for protected attributes with larger performance discrepancies between segments.
翻译:自监督学习(SSL)已成为大规模模型事实上的训练范式,其通常先进行预训练,再使用领域特定数据和标签进行监督式微调。尽管SSL已展现出与监督方法相当的性能表现,但目前仍缺乏评估SSL对机器学习公平性影响(即在不同人口统计分组上表现一致)的系统性研究。本研究假设SSL模型能够学习到更通用、因而偏见更少的表征,并在此基础上探索了预训练与微调策略对公平性的影响。我们提出了一个用于SSL的公平性评估框架,包含五个阶段:定义数据集要求、预训练、采用渐进解冻的微调、基于人口统计条件的表征相似性评估,以及建立领域特定的评估流程。我们通过在三个现实世界以人为中心的数据集(即MIMIC、MESA和GLOBEM)上系统比较数百个SSL模型与微调模型在从中间表征到适当评估指标等多个维度的表现,验证了所提方法的泛化能力。研究结果表明,SSL能在保持与监督方法相当性能的同时显著提升模型公平性——通过自监督机制,公平性指标最高可提升30%而性能损失极小。我们认为这种差异可归因于不同模型在最佳与最差表现人口分组间发现的表征不相似性——对于性能差异较大的受保护属性,其表征不相似性最高可达13倍。