We provide practical, efficient, and nonparametric methods for auditing the fairness of deployed classification and regression models. Whereas previous work relies on a fixed-sample size, our methods are sequential and allow for the continuous monitoring of incoming data, making them highly amenable to tracking the fairness of real-world systems. We also allow the data to be collected by a probabilistic policy as opposed to sampled uniformly from the population. This enables auditing to be conducted on data gathered for another purpose. Moreover, this policy may change over time and different policies may be used on different subpopulations. Finally, our methods can handle distribution shift resulting from either changes to the model or changes in the underlying population. Our approach is based on recent progress in anytime-valid inference and game-theoretic statistics-the "testing by betting" framework in particular. These connections ensure that our methods are interpretable, fast, and easy to implement. We demonstrate the efficacy of our approach on three benchmark fairness datasets.
翻译:我们提供了实用、高效且非参数的审计方法,用于评估已部署的分类与回归模型的公平性。现有研究方法通常依赖于固定样本量,而我们的方法具有序贯性,能够对持续流入的数据进行实时监测,从而高度适用于跟踪真实世界系统的公平性。此外,我们允许数据通过概率策略而非均匀采样方式从总体中收集,这使得审计工作可基于因其他目的而收集的数据进行。该策略可随时间动态调整,并允许对不同子群体采用差异化的策略。最后,我们的方法能够处理因模型更新或底层分布变化所引发的分布偏移问题。本研究方法基于近期在任意时间有效推断与博弈论统计领域取得的突破——尤其是“通过博弈进行检验”框架。这些理论关联确保了我们的方法具备可解释性、高计算效率与易实现性。我们在三个基准公平性数据集上验证了所提方法的有效性。