Tabular data is prevalent in many high-stakes domains, such as financial services or public policy. Gradient Boosted Decision Trees (GBDT) are popular in these settings due to their scalability, performance, and low training cost. While fairness in these domains is a foremost concern, existing in-processing Fair ML methods are either incompatible with GBDT, or incur in significant performance losses while taking considerably longer to train. We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. Since observational fairness metrics are non-differentiable, we propose smooth convex error rate proxies for common fairness criteria, enabling gradient-based optimization using a ``proxy-Lagrangian'' formulation. Our implementation shows an order of magnitude speedup in training time relative to related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.
翻译:表格数据广泛应用于金融服务、公共政策等高风险领域。梯度提升决策树(GBDT)凭借其可扩展性、高性能和低训练成本,在这些场景中备受青睐。尽管这些领域对公平性极为关注,但现有处理过程中的公平机器学习方法要么与GBDT不兼容,要么在显著延长训练时间的同时导致性能严重下降。我们提出FairGBM——一种在公平性约束下训练GBDT的对偶上升学习框架,与无约束GBDT相比,其预测性能几乎不受影响。针对观测性公平指标不可微的问题,我们为常见公平准则提出了光滑凸误差率代理函数,通过“代理-拉格朗日”形式实现基于梯度的优化。实验表明,与相关工作相比,该实现将训练速度提升一个数量级,这对促进FairGBM被现实从业者广泛采用具有关键意义。