As machine learning applications grow increasingly ubiquitous and complex, they face an increasing set of requirements beyond accuracy. The prevalent approach to handle this challenge is to aggregate a weighted combination of requirement violation penalties into the training objective. To be effective, this approach requires careful tuning of these hyperparameters (weights), involving trial-and-error and cross-validation, which becomes ineffective even for a moderate number of requirements. These issues are exacerbated when the requirements involve parities or equalities, as is the case in fairness and boundary value problems. An alternative technique uses constrained optimization to formulate these learning problems. Yet, existing approximation and generalization guarantees do not apply to problems involving equality constraints. In this work, we derive a generalization theory for equality-constrained statistical learning problems, showing that their solutions can be approximated using samples and rich parametrizations. Using these results, we propose a practical algorithm based on solving a sequence of unconstrained, empirical learning problems. We showcase its effectiveness and the new formulations enabled by equality constraints in fair learning, interpolating classifiers, and boundary value problems.
翻译:随着机器学习应用日益普及和复杂,它们面临着除准确性之外越来越多的要求。处理这一挑战的主流方法是将违反要求的惩罚项加权组合到训练目标中。为了有效,这种方法需要仔细调整这些超参数(权重),涉及试错和交叉验证,即使对于中等数量的要求也会变得低效。当要求涉及奇偶性或等式时(如公平性和边界值问题中的情况),这些问题会进一步加剧。另一种技术使用约束优化来表述这些学习问题。然而,现有的近似和泛化保证并不适用于涉及等式约束的问题。在这项工作中,我们推导了等式约束统计学习问题的泛化理论,表明它们的解可以通过样本和丰富的参数化来近似。利用这些结果,我们提出了一种基于求解一系列无约束经验学习问题的实用算法。我们在公平学习、插值分类器和边界值问题中展示了其有效性以及等式约束所实现的新表述。