We propose MCGrad, a novel and scalable multicalibration algorithm. Multicalibration - calibration in subgroups of the data - is an important property for the performance of machine learning-based systems. Existing multicalibration methods have thus far received limited traction in industry. We argue that this is because existing methods (1) require such subgroups to be manually specified, which ML practitioners often struggle with, (2) are not scalable, or (3) may harm other notions of model performance such as log loss and Area Under the Precision-Recall Curve (PRAUC). MCGrad does not require explicit specification of protected groups, is scalable, and often improves other ML evaluation metrics instead of harming them. MCGrad has been in production at Meta, and is now part of hundreds of production models. We present results from these deployments as well as results on public datasets. We provide an open source implementation of MCGrad at https://github.com/facebookincubator/MCGrad.
翻译:我们提出MCGrad,一种新颖且可扩展的多重校准算法。多重校准——在数据子组中进行校准——是基于机器学习的系统性能的重要属性。现有的多重校准方法迄今在工业界应用有限。我们认为这是因为现有方法(1)要求此类子组需手动指定,而机器学习从业者往往难以处理;(2)不具备可扩展性;或(3)可能损害模型的其他性能指标,如对数损失和精确率-召回率曲线下面积(PRAUC)。MCGrad无需显式指定受保护组,具有可扩展性,并且通常会改善而非损害其他机器学习评估指标。MCGrad已在Meta投入生产,现已成为数百个生产模型的一部分。我们展示了这些部署的结果以及在公共数据集上的结果。我们在 https://github.com/facebookincubator/MCGrad 提供了MCGrad的开源实现。