Ensembling multiple Deep Neural Networks (DNNs) is a simple and effective way to improve top-line metrics and to outperform a larger single model. In this work, we go beyond top-line metrics and instead explore the impact of ensembling on subgroup performances. Surprisingly, we observe that even with a simple homogeneous ensemble -- all the individual DNNs share the same training set, architecture, and design choices -- the minority group performance disproportionately improves with the number of models compared to the majority group, i.e. fairness naturally emerges from ensembling. Even more surprising, we find that this gain keeps occurring even when a large number of models is considered, e.g. $20$, despite the fact that the average performance of the ensemble plateaus with fewer models. Our work establishes that simple DNN ensembles can be a powerful tool for alleviating disparate impact from DNN classifiers, thus curbing algorithmic harm. We also explore why this is the case. We find that even in homogeneous ensembles, varying the sources of stochasticity through parameter initialization, mini-batch sampling, and data-augmentation realizations, results in different fairness outcomes.
翻译:集成多个深度神经网络(DNN)是提升整体指标并超越更大型单一模型的简单有效方法。本研究突破整体指标的限制,转而探索集成对子群性能的影响。令人惊讶的是,即使采用简单同质集成(所有独立DNN共享相同训练集、架构和设计选择),少数群体性能随模型数量提升的比例仍显著优于多数群体,即公平性从集成过程中自然涌现。更值得注意的是,即便考虑大量模型(如20个),这种增益仍持续出现——尽管集成平均性能在更少模型时已趋于稳定。我们的工作表明,简单DNN集成可成为缓解深度神经网络分类器差异影响的强效工具,从而遏制算法危害。我们进一步探究了该现象成因:即便在同质集成中,通过参数初始化、小批量采样和数据增强实现等随机性来源的变化,仍会产生不同的公平性结果。