To address the shortcomings of real-world datasets, robust learning algorithms have been designed to overcome arbitrary and indiscriminate data corruption. However, practical processes of gathering data may lead to patterns of data corruption that are localized to specific partitions of the training dataset. Motivated by critical applications where the learned model is deployed to make predictions about people from a rich collection of overlapping subpopulations, we initiate the study of multigroup robust algorithms whose robustness guarantees for each subpopulation only degrade with the amount of data corruption inside that subpopulation. When the data corruption is not distributed uniformly over subpopulations, our algorithms provide more meaningful robustness guarantees than standard guarantees that are oblivious to how the data corruption and the affected subpopulations are related. Our techniques establish a new connection between multigroup fairness and robustness.
翻译:为了应对真实世界数据集的缺陷,鲁棒学习算法已被设计用于克服任意且无差别的数据损坏。然而,实际的数据收集过程可能导致数据损坏模式仅局限于训练数据集的特定分区。受关键应用场景的启发——即所学模型需对来自重叠子群体的丰富集合中的人群进行预测,我们首次提出了多群体鲁棒算法的研究。这类算法对每个子群体的鲁棒性保证仅随该子群体内部的数据损坏量而降低。当数据损坏在子群体间并非均匀分布时,我们的算法能提供比标准保证更具意义的鲁棒性保证——标准保证无视数据损坏与受影响子群体之间的关联性。我们的技术建立了多群体公平性与鲁棒性之间的新联系。