Tabular data is prevalent in many high stakes domains, such as financial services or public policy. Gradient boosted decision trees (GBDT) are popular in these settings due to performance guarantees and low cost. However, in consequential decision-making fairness is a foremost concern. Despite GBDT's popularity, existing in-processing Fair ML methods are either inapplicable to GBDT, or incur in significant train time overhead, or are inadequate for problems with high class imbalance -- a typical issue in these domains. We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. Since observational fairness metrics are non-differentiable, we have to employ a "proxy-Lagrangian" formulation using smooth convex error rate proxies to enable gradient-based optimization. Our implementation shows an order of magnitude speedup in training time when compared with related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.
翻译:表格数据广泛应用于金融服务或公共政策等高风险领域。梯度提升决策树(GBDT)因性能保证和低成本在这些场景中备受青睐。然而,在影响深远的决策制定中,公平性是最重要的考量。尽管GBDT应用广泛,但现有的处理中公平机器学习方法要么无法适用于GBDT,要么在训练时间上产生显著开销,要么无法有效处理这些领域常见的高度类别不平衡问题。我们提出FairGBM——一种在公平性约束下训练GBDT的对偶上升学习框架,与无约束GBDT相比,其预测性能几乎不受影响。由于观测性公平性指标不可微,我们采用基于平滑凸错误率代理的"代理-拉格朗日"公式,以实现基于梯度的优化。与相关研究相比,我们的实现训练速度提升了一个数量级,这是促进FairGBM被实际从业者广泛采用的关键因素。